Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models

Z Lin, S Guan, W Zhang, H Zhang, Y Li… - Artificial Intelligence …, 2024 - Springer
Recently, large language models (LLMs) have attracted considerable attention due to their
remarkable capabilities. However, LLMs' generation of biased or hallucinatory content …

The debate over understanding in AI's large language models

M Mitchell, DC Krakauer - Proceedings of the National …, 2023 - National Acad Sciences
We survey a current, heated debate in the artificial intelligence (AI) research community on
whether large pretrained language models can be said to understand language—and the …

Impact of pretraining term frequencies on few-shot reasoning

Y Razeghi, RL Logan IV, M Gardner… - arxiv preprint arxiv …, 2022 - arxiv.org
Pretrained Language Models (LMs) have demonstrated ability to perform numerical
reasoning by extrapolating from a few examples in few-shot settings. However, the extent to …

Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and LLMs evaluations

L Yuan, Y Chen, G Cui, H Gao, F Zou… - Advances in …, 2023 - proceedings.neurips.cc
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of
NLP. We find that the distribution shift settings in previous studies commonly lack adequate …

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Wanli: Worker and ai collaboration for natural language inference dataset creation

A Liu, S Swayamdipta, NA Smith, Y Choi - arxiv preprint arxiv:2201.05955, 2022 - arxiv.org
A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often
rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We …

Generating data to mitigate spurious correlations in natural language inference datasets

Y Wu, M Gardner, P Stenetorp, P Dasigi - arxiv preprint arxiv:2203.12942, 2022 - arxiv.org
Natural language processing models often exploit spurious correlations between task-
independent features and labels in datasets to perform well only within the distributions they …

Tailor: Generating and perturbing text with semantic controls

A Ross, T Wu, H Peng, ME Peters… - arxiv preprint arxiv …, 2021 - arxiv.org
Controlled text perturbation is useful for evaluating and improving model generalizability.
However, current techniques rely on training a model for every target perturbation, which is …

Changing the world by changing the data

A Rogers - arxiv preprint arxiv:2105.13947, 2021 - arxiv.org
NLP community is currently investing a lot more research and resources into development of
deep learning models than training data. While we have made a lot of progress, it is now …