Tabular data augmentation for machine learning: Progress and prospects of embracing generative ai
Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality
tabular data for model training remains a significant obstacle. Numerous works have …
tabular data for model training remains a significant obstacle. Numerous works have …
Table meets llm: Can large language models understand structured table data? a benchmark and empirical study
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve
Natural Language (NL)-related tasks. However, there is still much to learn about how well …
Natural Language (NL)-related tasks. However, there is still much to learn about how well …
HYTREL: Hypergraph-enhanced tabular data representation learning
Abstract Language models pretrained on large collections of tabular data have
demonstrated their effectiveness in several downstream tasks. However, many of these …
demonstrated their effectiveness in several downstream tasks. However, many of these …
Entrant: A large financial dataset for table understanding
Tabular data is a way to structure, organize, and present information conveniently and
effectively. Real-world tables present data in two dimensions by arranging cells in matrices …
effectively. Real-world tables present data in two dimensions by arranging cells in matrices …
SpaBERT: A pretrained language model from geographic data for geo-entity representation
Named geographic entities (geo-entities for short) are the building blocks of many
geographic datasets. Characterizing geo-entities is integral to various application domains …
geographic datasets. Characterizing geo-entities is integral to various application domains …
A Large Scale Test Corpus for Semantic Table Search
Table search aims to answer a query with a ranked list of tables. Unfortunately, current test
corpora have focused mostly on needle-in-the-haystack tasks, where only a few tables are …
corpora have focused mostly on needle-in-the-haystack tasks, where only a few tables are …
Mgeo: Multi-modal geographic language model pre-training
Query and point of interest (POI) matching is a core task in location-based services~(LBS),
eg, navigation maps. It connects users' intent with real-world geographic information. Lately …
eg, navigation maps. It connects users' intent with real-world geographic information. Lately …
Towards Cross-Table Masked Pretraining for Web Data Mining
Tabular data pervades the landscape of the World Wide Web, playing a foundational role in
the digital architecture that underpins online information. Given the recent influence of large …
the digital architecture that underpins online information. Given the recent influence of large …
Simulating users in interactive web table retrieval
Considering the multimodal signals of search items is beneficial for retrieval effectiveness.
Especially in web table retrieval (WTR) experiments, accounting for multimodal properties of …
Especially in web table retrieval (WTR) experiments, accounting for multimodal properties of …
Dame: Domain adaptation for matching entities
Entity matching (EM) identifies data records that refer to the same real-world entity. Despite
the effort in the past years to improve the performance in EM, the existing methods still …
the effort in the past years to improve the performance in EM, the existing methods still …