Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …
and various other document types, a flurry of table pre-training frameworks have been …
ExceLint: automatically finding spreadsheet formula errors
Spreadsheets are one of the most widely used programming environments, and are widely
deployed in domains like finance where errors can have catastrophic consequences. We …
deployed in domains like finance where errors can have catastrophic consequences. We …
Spreadsheetcoder: Formula prediction from semi-structured context
Spreadsheet formula prediction has been an important program synthesis problem with
many real-world applications. Previous works typically utilize input-output examples as the …
many real-world applications. Previous works typically utilize input-output examples as the …
Fortap: Using formulas for numerical-reasoning-aware table pretraining
Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In
this paper, we find that the spreadsheet formula, which performs calculations on numerical …
this paper, we find that the spreadsheet formula, which performs calculations on numerical …
NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries
Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a
widespread practice among users performing data analysis. However, crafting formulas on …
widespread practice among users performing data analysis. However, crafting formulas on …
Auto-detect: Data-driven error detection in tables
Given a single column of values, existing approaches typically employ regex-like rules to
detect errors by finding anomalous values inconsistent with others. Such techniques make …
detect errors by finding anomalous values inconsistent with others. Such techniques make …
SpreadsheetBench: towards challenging real world spreadsheet manipulation
We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark
exclusively derived from real-world scenarios, designed to immerse current large language …
exclusively derived from real-world scenarios, designed to immerse current large language …
Smelly relations: measuring and understanding database schema quality
Context: Databases are an integral element of enterprise applications. Similarly to code,
database schemas are also prone to smells-best practice violations. Objective: We aim to …
database schemas are also prone to smells-best practice violations. Objective: We aim to …
Semantic table structure identification in spreadsheets
Spreadsheets are widely used in various business tasks, and contain amounts of valuable
data. However, spreadsheet tables are usually organized in a semi-structured way, and …
data. However, spreadsheet tables are usually organized in a semi-structured way, and …
[HTML][HTML] Spreadsheet debugging: The perils of tool over-reliance
Spreadsheets are widely used in organizations for various purposes such as data
aggregation, reporting and decision-making. Since spreadsheets, like other types of …
aggregation, reporting and decision-making. Since spreadsheets, like other types of …