Text data augmentation for deep learning

C Shorten, TM Khoshgoftaar, B Furht - Journal of big Data, 2021 - Springer
Abstract Natural Language Processing (NLP) is one of the most captivating applications of
Deep Learning. In this survey, we consider how the Data Augmentation training strategy can …

A survey on data augmentation for text classification

M Bayer, MA Kaufhold, C Reuter - ACM Computing Surveys, 2022 - dl.acm.org
Data augmentation, the artificial creation of training data for machine learning by
transformations, is a widely studied research field across machine learning disciplines …

[HTML][HTML] Data augmentation approaches in natural language processing: A survey

B Li, Y Hou, W Che - Ai Open, 2022 - Elsevier
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where
deep learning techniques may fail. It is widely applied in computer vision then introduced to …

Klue: Korean language understanding evaluation

S Park, J Moon, S Kim, WI Cho, J Han, J Park… - arxiv preprint arxiv …, 2021 - arxiv.org
We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a
collection of 8 Korean natural language understanding (NLU) tasks, including Topic …

Shortcut learning of large language models in natural language understanding

M Du, F He, N Zou, D Tao, X Hu - Communications of the ACM, 2023 - dl.acm.org
Shortcut Learning of Large Language Models in Natural Language Understanding Page 1 110
COMMUNICATIONS OF THE ACM | JANUARY 2024 | VOL. 67 | NO. 1 research IMA GE B Y …

An empirical survey of data augmentation for limited data learning in NLP

J Chen, D Tam, C Raffel, M Bansal… - Transactions of the …, 2023 - direct.mit.edu
NLP has achieved great progress in the past decade through the use of neural models and
large labeled datasets. The dependence on abundant data prevents NLP models from being …

An analysis of simple data augmentation for named entity recognition

X Dai, H Adel - arxiv preprint arxiv:2010.11683, 2020 - arxiv.org
Simple yet effective data augmentation techniques have been proposed for sentence-level
and sentence-pair natural language processing tasks. Inspired by these efforts, we design …

What happens to BERT embeddings during fine-tuning?

A Merchant, E Rahimtoroghi, E Pavlick… - arxiv preprint arxiv …, 2020 - arxiv.org
While there has been much recent work studying how linguistic information is encoded in
pre-trained sentence representations, comparatively little is understood about how these …

How can we accelerate progress towards human-like linguistic generalization?

T Linzen - arxiv preprint arxiv:2005.00955, 2020 - arxiv.org
This position paper describes and critiques the Pretraining-Agnostic Identically Distributed
(PAID) evaluation paradigm, which has become a central tool for measuring progress in …

An empirical study on robustness to spurious correlations using pre-trained language models

L Tu, G Lalwani, S Gella, H He - Transactions of the Association for …, 2020 - direct.mit.edu
Recent work has shown that pre-trained language models such as BERT improve
robustness to spurious correlations in the dataset. Intrigued by these results, we find that the …