- Academic Search

L Cui, H Li, K Chen, L Shou, G Chen - arxiv preprint arxiv:2407.21523, 2024 - arxiv.org

Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality
tabular data for model training remains a significant obstacle. Numerous works have …

Save Cite Cited by 4 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

CHORUS: foundation models for unified data discovery and exploration

M Kayali, A Lykov, I Fountalis, N Vasiloglou… - arxiv preprint arxiv …, 2023 - arxiv.org

We explore the application of foundation models to data discovery and exploration tasks.
Foundation models are large language models (LLMs) that show promising performance on …

Save Cite Cited by 32 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

A Khatiwada, H Kokel, I Abdelaziz… - arxiv preprint arxiv …, 2024 - arxiv.org

Enterprises have a growing need to identify relevant tables in data lakes; eg tables that are
unionable, joinable, or subsets of each other. Tabular neural models can be helpful for such …

Save Cite Cited by 3 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Retrieve, merge, predict: Augmenting tables with data lakes

R Cappuzzo, A Coelho, F Lefebvre, P Papotti… - arxiv preprint arxiv …, 2024 - arxiv.org

Machine-learning from a disparate set of tables, a data lake, requires assembling features
by merging and aggregating tables. Data discovery can extend autoML to data tables by …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A Survey on Data Markets

J Zhang, Y Bi, M Cheng, J Liu, K Ren, Q Sun… - arxiv preprint arxiv …, 2024 - arxiv.org

Data is the new oil of the 21st century. The growing trend of trading data for greater welfare
has led to the emergence of data markets. A data market is any mechanism whereby the …

Save Cite Cited by 2 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Towards Accurate and Efficient Document Analytics with Large Language Models

Y Lin, M Hulsebos, R Ma, S Shankar… - arxiv preprint arxiv …, 2024 - arxiv.org

Unstructured data formats account for over 80% of the data currently stored, and extracting
value from such formats remains a considerable challenge. In particular, current approaches …

Save Cite Cited by 7 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] vldb.org

LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes

Y Deng, C Chai, L Cao, Q Yuan, S Chen, Y Yu… - Proceedings of the …, 2024 - dl.acm.org

Discovering tables from poorly maintained data lakes is a significant challenge in data
management. Two key tasks are identifying joinable and unionable tables, crucial for data …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] vldb.org

Searching Data Lakes for Nested and Joined Data

Y Zhang, PB Chen, ZG Ives - Proceedings of the VLDB Endowment, 2024 - dl.acm.org

Exploratory data science is driving new platforms that assist data scientists with everyday
tasks, such as integration and wrangling, to assemble training datasets. Such tools take …

[Free GPT-4]

[PDF] acm.org

Graph Machine Learning Meets Multi-Table Relational Data

Q Gan, M Wang, D Wipf, C Faloutsos - Proceedings of the 30th ACM …, 2024 - dl.acm.org

While graph machine learning, and notably graph neural networks (GNNs), have gained
immense traction in recent years, application is predicated on access to a known input graph …

Save Cite Related articles

NumJoin: Discovering Numeric Joinable Tables with Semantically Related Columns

P Subramaniam, U Khurana, K Srinivas… - Proceedings of the …, 2023 - dl.acm.org

Join discovery is a crucial part of exploration on data lakes. It often involves finding joinable
tables that are semantically relevant. However, data lakes often contain numeric tables with …

Save Cite Cited by 3 Related articles All 2 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Deepjoin: Joinable table discovery with pre-trained language models

Tabular data augmentation for machine learning: Progress and prospects of embracing generative ai

CHORUS: foundation models for unified data discovery and exploration

TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

Retrieve, merge, predict: Augmenting tables with data lakes

A Survey on Data Markets

Towards Accurate and Efficient Document Analytics with Large Language Models

LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes

Searching Data Lakes for Nested and Joined Data

Graph Machine Learning Meets Multi-Table Relational Data

NumJoin: Discovering Numeric Joinable Tables with Semantically Related Columns