Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks

H Dong, Z Cheng, X He, M Zhou, A Zhou… - arxiv preprint arxiv …, 2022 - arxiv.org
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …

ExceLint: automatically finding spreadsheet formula errors

DW Barowy, ED Berger, B Zorn - Proceedings of the ACM on …, 2018 - dl.acm.org
Spreadsheets are one of the most widely used programming environments, and are widely
deployed in domains like finance where errors can have catastrophic consequences. We …

Spreadsheetcoder: Formula prediction from semi-structured context

X Chen, P Maniatis, R Singh, C Sutton… - International …, 2021 - proceedings.mlr.press
Spreadsheet formula prediction has been an important program synthesis problem with
many real-world applications. Previous works typically utilize input-output examples as the …

Fortap: Using formulas for numerical-reasoning-aware table pretraining

Z Cheng, H Dong, R Jia, P Wu, S Han, F Cheng… - arxiv preprint arxiv …, 2021 - arxiv.org
Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In
this paper, we find that the spreadsheet formula, which performs calculations on numerical …

NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries

W Zhao, Z Hou, S Wu, Y Gao, H Dong, Y Wan… - arxiv preprint arxiv …, 2024 - arxiv.org
Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a
widespread practice among users performing data analysis. However, crafting formulas on …

Auto-detect: Data-driven error detection in tables

Z Huang, Y He - Proceedings of the 2018 International Conference on …, 2018 - dl.acm.org
Given a single column of values, existing approaches typically employ regex-like rules to
detect errors by finding anomalous values inconsistent with others. Such techniques make …

SpreadsheetBench: towards challenging real world spreadsheet manipulation

Z Ma, B Zhang, J Zhang, J Yu, X Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark
exclusively derived from real-world scenarios, designed to immerse current large language …

Smelly relations: measuring and understanding database schema quality

T Sharma, M Fragkoulis, S Rizou, M Bruntink… - Proceedings of the 40th …, 2018 - dl.acm.org
Context: Databases are an integral element of enterprise applications. Similarly to code,
database schemas are also prone to smells-best practice violations. Objective: We aim to …

Semantic table structure identification in spreadsheets

Y Zhang, X Lv, H Dong, W Dou, S Han… - Proceedings of the 30th …, 2021 - dl.acm.org
Spreadsheets are widely used in various business tasks, and contain amounts of valuable
data. However, spreadsheet tables are usually organized in a semi-structured way, and …

[HTML][HTML] Spreadsheet debugging: The perils of tool over-reliance

A Mukhtar, B Hofer, D Jannach, F Wotawa - Journal of Systems and …, 2022 - Elsevier
Spreadsheets are widely used in organizations for various purposes such as data
aggregation, reporting and decision-making. Since spreadsheets, like other types of …