- Academic Search

S Zhang, K Balog - ACM Transactions on Intelligent Systems and …, 2020 - dl.acm.org

Tables are powerful and popular tools for organizing and manipulating data. A vast number
of tables can be found on the Web, which represent a valuable knowledge resource. The …

Save Cite Cited by 155 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Jigsaw: Large language models meet program synthesis

N Jain, S Vaidyanath, A Iyer, N Natarajan… - Proceedings of the 44th …, 2022 - dl.acm.org

Large pre-trained language models such as GPT-3 [10], Codex [11], and Google's language
model [7] are now capable of generating code from natural language specifications of …

Save Cite Cited by 229 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Can foundation models wrangle your data?

A Narayan, I Chami, L Orr, S Arora, C Ré - arxiv preprint arxiv:2205.09911, 2022 - arxiv.org

Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

Save Cite Cited by 200 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Table-gpt: Table-tuned gpt for diverse table tasks

P Li, Y He, D Yashar, W Cui, S Ge, H Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

Language models, such as GPT-3.5 and ChatGPT, demonstrate remarkable abilities to
follow diverse human instructions and perform a wide range of tasks. However, when …

Save Cite Cited by 57 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] archive.org

Auto-em: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning

C Zhao, Y He - The World Wide Web Conference, 2019 - dl.acm.org

Entity matching (EM), also known as entity resolution, fuzzy join, and record linkage, refers to
the process of identifying records corresponding to the same real-world entities from …

Save Cite Cited by 157 Related articles All 2 versions Free GPT-4

Applications and challenges for large language models: From data management perspective

M Zhang, Z Ji, Z Luo, Y Wu… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org

Data management is indispensable for informed decision-making in the big data era. In the
meantime, Large Language Models (LLMs), equipped with billions of model parameters and …

Save Cite Cited by 3 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

AutoPandas: neural-backed generators for program synthesis

R Bavishi, C Lemieux, R Fox, K Sen… - Proceedings of the ACM on …, 2019 - dl.acm.org

Developers nowadays have to contend with a growing number of APIs. While in the long-
term they are very useful to developers, many modern APIs have an incredibly steep …

Save Cite Cited by 112 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] github.io

Auto-suggest: Learning-to-recommend data preparation steps using data science notebooks

C Yan, Y He - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org

Data preparation is widely recognized as the most time-consuming process in modern
business intelligence (BI) and machine learning (ML) projects. Automating complex data …

Save Cite Cited by 96 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks

P Li, X Rao, J Blase, Y Zhang, X Chu… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org

Data quality affects machine learning (ML) model performances, and data scientists spend
considerable amount of time on data cleaning before model training. However, to date, there …

Save Cite Cited by 139 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] aclanthology.org

Jellyfish: Instruction-tuning local large language models for data preprocessing

H Zhang, Y Dong, C **ao… - Proceedings of the 2024 …, 2024 - aclanthology.org

This paper explores the utilization of LLMs for data preprocessing (DP), a crucial step in the
data mining pipeline that transforms raw data into a clean format. We instruction-tune local …

Save Cite Cited by 5 Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Transform-data-by-example (TDE) an extensible search engine for data transformations

Web table extraction, retrieval, and augmentation: A survey

Jigsaw: Large language models meet program synthesis

Can foundation models wrangle your data?

Table-gpt: Table-tuned gpt for diverse table tasks

Auto-em: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning

Applications and challenges for large language models: From data management perspective

AutoPandas: neural-backed generators for program synthesis

Auto-suggest: Learning-to-recommend data preparation steps using data science notebooks

Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks

Jellyfish: Instruction-tuning local large language models for data preprocessing