A comprehensive survey on automatic knowledge graph construction

L Zhong, J Wu, Q Li, H Peng, X Wu - ACM Computing Surveys, 2023 - dl.acm.org
Automatic knowledge graph construction aims at manufacturing structured human
knowledge. To this end, much effort has historically been spent extracting informative fact …

A survey of deep active learning

P Ren, Y **ao, X Chang, PY Huang, Z Li… - ACM computing …, 2021 - dl.acm.org
Active learning (AL) attempts to maximize a model's performance gain while annotating the
fewest samples possible. Deep learning (DL) is greedy for data and requires a large amount …

Can foundation models wrangle your data?

A Narayan, I Chami, L Orr, S Arora, C Ré - arxiv preprint arxiv:2205.09911, 2022 - arxiv.org
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

Deep entity matching with pre-trained language models

Y Li, J Li, Y Suhara, AH Doan, WC Tan - arxiv preprint arxiv:2004.00584, 2020 - arxiv.org
We present Ditto, a novel entity matching system based on pre-trained Transformer-based
language models. We fine-tune and cast EM as a sequence-pair classification problem to …

Neo: A learned query optimizer

R Marcus, P Negi, H Mao, C Zhang, M Alizadeh… - arxiv preprint arxiv …, 2019 - arxiv.org
Query optimization is one of the most challenging problems in database systems. Despite
the progress made over the past decades, query optimizers remain extremely complex …

Data-driven materials research enabled by natural language processing and information extraction

EA Olivetti, JM Cole, E Kim, O Kononova… - Applied Physics …, 2020 - pubs.aip.org
Given the emergence of data science and machine learning throughout all aspects of
society, but particularly in the scientific domain, there is increased importance placed on …

Table-gpt: Table-tuned gpt for diverse table tasks

P Li, Y He, D Yashar, W Cui, S Ge, H Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Language models, such as GPT-3.5 and ChatGPT, demonstrate remarkable abilities to
follow diverse human instructions and perform a wide range of tasks. However, when …

A benchmarking study of embedding-based entity alignment for knowledge graphs

Z Sun, Q Zhang, W Hu, C Wang, M Chen… - arxiv preprint arxiv …, 2020 - arxiv.org
Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the
same real-world object. Recent advancement in KG embedding impels the advent of …

[KNJIGA][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Table-gpt: Table fine-tuned gpt for diverse table tasks

P Li, Y He, D Yashar, W Cui, S Ge, H Zhang… - Proceedings of the …, 2024 - dl.acm.org
Language models, such as GPT-3 and ChatGPT, demonstrate remarkable abilities to follow
diverse human instructions and perform a wide range of tasks, using instruction fine-tuning …