Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks

H Dong, Z Cheng, X He, M Zhou, A Zhou… - arxiv preprint arxiv …, 2022 - arxiv.org
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …

Deep learning for table detection and structure recognition: A survey

M Salaheldin Kasem, A Abdallah, A Berendeyev… - ACM Computing …, 2024 - dl.acm.org
Tables are everywhere, from scientific journals, articles, websites, and newspapers all the
way to items we buy at the supermarket. Detecting them is thus of utmost importance to …

Tuta: Tree-based transformers for generally structured table pre-training

Z Wang, H Dong, R Jia, J Li, Z Fu, S Han… - Proceedings of the 27th …, 2021 - dl.acm.org
We propose TUTA, a unified pre-training architecture for understanding generally structured
tables. Noticing that understanding a table requires spatial, hierarchical, and semantic …

Large language models for tabular data: Progresses and future directions

H Dong, Z Wang - Proceedings of the 47th International ACM SIGIR …, 2024 - dl.acm.org
Tables contain a significant portion of the world's structured information. The ability to
efficiently and accurately understand, process, reason about, analyze, and generate tabular …

Entrant: A large financial dataset for table understanding

E Zavitsanos, D Mavroeidis, E Spyropoulou… - Scientific Data, 2024 - nature.com
Tabular data is a way to structure, organize, and present information conveniently and
effectively. Real-world tables present data in two dimensions by arranging cells in matrices …

Table understanding: Problem overview

A Shigarov - Wiley Interdisciplinary Reviews: Data Mining and …, 2023 - Wiley Online Library
Tables are probably the most natural way to represent relational data in various media and
formats. They store a large number of valuable facts that could be utilized for question …

Fortap: Using formulas for numerical-reasoning-aware table pretraining

Z Cheng, H Dong, R Jia, P Wu, S Han, F Cheng… - arxiv preprint arxiv …, 2021 - arxiv.org
Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In
this paper, we find that the spreadsheet formula, which performs calculations on numerical …

GetPt: Graph-enhanced General Table Pre-training with Alternate Attention Network

R Jia, H Guo, X **, C Yan, L Du, X Ma… - Proceedings of the 29th …, 2023 - dl.acm.org
Tables are widely used for data storage and presentation due to their high flexibility in
layout. The importance of tables as information carriers and the complexity of tabular data …

End-to-End Compound Table Understanding with Multi-Modal Modeling

Z Li, Y Li, Q Liang, P Li, Z Cheng, Y Niu, S Pu… - Proceedings of the 30th …, 2022 - dl.acm.org
Table is a widely used data form in webpages, spreadsheets, or PDFs to organize and
present structural data. Although studies on table structure recognition have been …

Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction

Q Zhang, VSJ Huang, B Wang, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Document parsing is essential for converting unstructured and semi-structured documents-
such as contracts, academic papers, and invoices-into structured, machine-readable data …