Metam: Goal-oriented data discovery
Data is a central component of machine learning and causal inference tasks. The availability
of large amounts of data from sources such as open data repositories, data lakes and data …
of large amounts of data from sources such as open data repositories, data lakes and data …
Observatory: Characterizing embeddings of relational tables
Language models and specialized table embedding models have recently demonstrated
strong performance on many tasks over tabular data. Researchers and practitioners are …
strong performance on many tasks over tabular data. Researchers and practitioners are …
Retrieve, merge, predict: Augmenting tables with data lakes
Machine-learning from a disparate set of tables, a data lake, requires assembling features
by merging and aggregating tables. Data discovery can extend autoML to data tables by …
by merging and aggregating tables. Data discovery can extend autoML to data tables by …
Warpgate: A semantic join discovery system for cloud data warehouses
Data discovery is a major challenge in enterprise data analysis: users often struggle to find
data relevant to their analysis goals or even to navigate through data across data sources …
data relevant to their analysis goals or even to navigate through data across data sources …
UniDM: A Unified Framework for Data Manipulation with Large Language Models
Designing effective data manipulation methods is a long standing problem in data lakes.
Traditional methods, which rely on rules or machine learning models, require extensive …
Traditional methods, which rely on rules or machine learning models, require extensive …
Towards an architecture to support data access in research data spaces
Using data from different data sources is a common procedure in data-driven research. As
required data is often not available from centrally managed sources, the concept of data …
required data is often not available from centrally managed sources, the concept of data …
Suggesting assess queries for interactive analysis of multidimensional data
Assessment is the process of comparing the actual to the expected behavior of a business
phenomenon and judging the outcome of the comparison. The querying operator has been …
phenomenon and judging the outcome of the comparison. The querying operator has been …
FREYJA: Efficient Join Discovery in Data Lakes
Data lakes are massive repositories of raw and heterogeneous data, designed to meet the
requirements of modern data storage. Nonetheless, this same philosophy increases the …
requirements of modern data storage. Nonetheless, this same philosophy increases the …
It Took Longer than I was Expecting: Why is Dataset Search Still so Hard?
Dataset search is a long-standing problem across both industry and academia. While most
industry tools focus on identifying one or more datasets matching a user-specified query …
industry tools focus on identifying one or more datasets matching a user-specified query …
[BUCH][B] Table Representation Learning
M Hulsebos - 2024 - pure.uva.nl
The increasing amount of data being collected, stored, and analyzed, induces a need for
efficient, scalable, and robust methods to handle this data. Representation learning, ie, the …
efficient, scalable, and robust methods to handle this data. Representation learning, ie, the …