Tabular data augmentation for machine learning: Progress and prospects of embracing generative ai

L Cui, H Li, K Chen, L Shou, G Chen - arxiv preprint arxiv:2407.21523, 2024 - arxiv.org
Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality
tabular data for model training remains a significant obstacle. Numerous works have …

Table meets llm: Can large language models understand structured table data? a benchmark and empirical study

Y Sui, M Zhou, M Zhou, S Han, D Zhang - Proceedings of the 17th ACM …, 2024 - dl.acm.org
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve
Natural Language (NL)-related tasks. However, there is still much to learn about how well …

HYTREL: Hypergraph-enhanced tabular data representation learning

P Chen, S Sarkar, L Lausen… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Language models pretrained on large collections of tabular data have
demonstrated their effectiveness in several downstream tasks. However, many of these …

Entrant: A large financial dataset for table understanding

E Zavitsanos, D Mavroeidis, E Spyropoulou… - Scientific Data, 2024 - nature.com
Tabular data is a way to structure, organize, and present information conveniently and
effectively. Real-world tables present data in two dimensions by arranging cells in matrices …

SpaBERT: A pretrained language model from geographic data for geo-entity representation

Z Li, J Kim, YY Chiang, M Chen - arxiv preprint arxiv:2210.12213, 2022 - arxiv.org
Named geographic entities (geo-entities for short) are the building blocks of many
geographic datasets. Characterizing geo-entities is integral to various application domains …

A Large Scale Test Corpus for Semantic Table Search

A Leventidis, MP Christensen, M Lissandrini… - Proceedings of the 47th …, 2024 - dl.acm.org
Table search aims to answer a query with a ranked list of tables. Unfortunately, current test
corpora have focused mostly on needle-in-the-haystack tasks, where only a few tables are …

Mgeo: Multi-modal geographic language model pre-training

R Ding, B Chen, P **e, F Huang, X Li… - Proceedings of the 46th …, 2023 - dl.acm.org
Query and point of interest (POI) matching is a core task in location-based services~(LBS),
eg, navigation maps. It connects users' intent with real-world geographic information. Lately …

Towards Cross-Table Masked Pretraining for Web Data Mining

C Ye, G Lu, H Wang, L Li, S Wu, G Chen… - Proceedings of the ACM …, 2024 - dl.acm.org
Tabular data pervades the landscape of the World Wide Web, playing a foundational role in
the digital architecture that underpins online information. Given the recent influence of large …

Simulating users in interactive web table retrieval

B Engelmann, T Breuer, P Schaer - Proceedings of the 32nd ACM …, 2023 - dl.acm.org
Considering the multimodal signals of search items is beneficial for retrieval effectiveness.
Especially in web table retrieval (WTR) experiments, accounting for multimodal properties of …

Dame: Domain adaptation for matching entities

M Trabelsi, J Heflin, J Cao - … Conference on Web Search and Data …, 2022 - dl.acm.org
Entity matching (EM) identifies data records that refer to the same real-world entity. Despite
the effort in the past years to improve the performance in EM, the existing methods still …