Query optimization in relational databases is heavily based on cardinality estimation, i.e., estimating the result size of an input query. With the recent developments in Machine Learning, researchers and practitioners see in Learned Cardinality Estimation an alternative to produce accurate estimations. Hence, Learned Cardinality Estimation can be formulated as a learning task (e.g., regression), where query features are used as input and the estimated cardinality as output. This thesis aims to investigate, evaluate, and compare current Machine Learning approaches for Cardinality Estimation under distinct settings (e.g., dataset imbalance, different feature extraction techniques, diverse learning algorithms).
The digitalization of power grids give raise to smart grids, where massive amount of data is generated. This massive data volume poses significant data management challenges, such as proper data exchange and data governance. To alleviate these challenges, researchers see in data marketplaces a means to support better data management among data owners in smart grids. This thesis aims to investigate the synergies between data marketplaces and smart grids.
This thesis aims at enabling data scientists or researchers to quickly use Self-Organizing Maps in their data exploration processes. Therefore, a program was developed to create interactive visualizations using such maps. A user can select regions of interest to get a new, specific subset. In addition, relationships between different features can be revealed to prepare datasets for various data or machine learning projects. A client-server architecture was chosen to create powerful visualizations with JavaScript and to handle data manipulation tasks with Python. Different relational databases can be used as input.
Formulating database SQL queries is challenging for a growing number of non-database experts (e.g., biologists, journalists, business administrators) that are required to access and explore data. Query By Example (QBE) methods offer an alternative mechanism where users can retrieve information from large databases using data examples that characterize their intent without having to write complex SQL queries. Traditional QBE methods such as SQLSynthesizer and TALOS address QBE under the setting of a classification problem where machine learning models are trained to classify data objects in the database as positive or negative, whenever they match the data examples or not, respectively. However, both methods require data examples to fully characterize the user intention. That is, they require a fully labeled training dataset. In practical QBE applications, users often provide only a small subset of the positive data class. This thesis aims to explore efficient positive and unlabeled learning (PUL) techniques for QBE over large databases. Recent PUL techniques fit the unique setting of QBE and are interesting alternatives to existing methods..
This bachelor thesis focuses on investigating techniques to optimize and accelerate CNN inference on resource-constrained microcontrollers. The study explores model quantization, pruning, and compression methods to reduce memory and computational requirements while maintaining acceptable accuracy. By developing efficient CNN inference strategies tailored for microcontrollers, this thesis aims to enable real-time and energy-efficient deployment of deep learning models in edge computing applications.
The need to efficiently sell and purchase data has led to an emergence of platforms that act as intermediaries between data providers and data consumers, but at the same time offer functionality for processing the data offered (e.g., cleansing, anonymizing, aggregating, etc.), typically before it is being sold. These platforms are referred to in the literature as data marketplaces. A problem that emerges when designing a marketplace for the commercialization of data and data-driven models is the one referring to bias and fairness. This thesis involves the investigation of bias and fairness issues in the context of data marketplaces. In this sense, the central questions this thesis addresses are: (1) How to define bias and fairness in data marketplaces? (2) How to make data providers and buyers aware of bias-related issues in datasets? (3) How bias and fairness issues impact data pricing schemes?
Query By Example (QBE) tries to make database interactions easier for non-technical users. Using QBE, users can retrieve information from (large) databases using data examples that characterize their intent. The quality of the results produced by QBE systems critically depends on the data examples provided by the user. This Master Thesis project aims to explore the use of Recommender Systems-like techniques to improve the way non-technical users provide examples to QBE systems. A particular interest in this project is the investigation of how to enhance example diversity so that they can better characterize the user's intent.