Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Large language model for table processing: A survey

W Lu, J Zhang, J Fan, Z Fu, Y Chen, X Du - Frontiers of Computer Science, 2025 - Springer
Tables, typically two-dimensional and structured to store large amounts of data, are
essential in daily activities like database queries, spreadsheet manipulations, Web table …

Making language models better reasoners with step-aware verifier

Y Li, Z Lin, S Zhang, Q Fu, B Chen… - Proceedings of the …, 2023 - aclanthology.org
Few-shot learning is a challenging task that requires language models to generalize from
limited examples. Large language models like GPT-3 and PaLM have made impressive …

Agentbench: Evaluating llms as agents

X Liu, H Yu, H Zhang, Y Xu, X Lei, H Lai, Y Gu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) are becoming increasingly smart and autonomous,
targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has …

Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning

P Lu, L Qiu, KW Chang, YN Wu, SC Zhu… - arxiv preprint arxiv …, 2022 - arxiv.org
Mathematical reasoning, a core ability of human intelligence, presents unique challenges for
machines in abstract thinking and logical reasoning. Recent large pre-trained language …

Folio: Natural language reasoning with first-order logic

S Han, H Schoelkopf, Y Zhao, Z Qi, M Riddell… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models (LLMs) have achieved remarkable performance on a variety of
natural language understanding tasks. However, existing benchmarks are inadequate in …

Finqa: A dataset of numerical reasoning over financial data

Z Chen, W Chen, C Smiley, S Shah, I Borova… - arxiv preprint arxiv …, 2021 - arxiv.org
The sheer volume of financial statements makes it difficult for humans to access and analyze
a business's financials. Robust numerical reasoning likewise faces unique challenges in this …

TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance

F Zhu, W Lei, Y Huang, C Wang, S Zhang, J Lv… - arxiv preprint arxiv …, 2021 - arxiv.org
Hybrid data combining both tabular and textual content (eg, financial reports) are quite
pervasive in the real world. However, Question Answering (QA) over such hybrid data is …

Large language models are few (1)-shot table reasoners

W Chen - arxiv preprint arxiv:2210.06710, 2022 - arxiv.org
Recent literature has shown that large language models (LLMs) are generally excellent few-
shot reasoners to solve text reasoning tasks. However, the capability of LLMs on table …

♫ MuSiQue: Multihop Questions via Single-hop Question Composition

H Trivedi, N Balasubramanian, T Khot… - Transactions of the …, 2022 - direct.mit.edu
Multihop reasoning remains an elusive goal as existing multihop benchmarks are known to
be largely solvable via shortcuts. Can we create a question answering (QA) dataset that, by …