[HTML][HTML] Data augmentation approaches in natural language processing: A survey

B Li, Y Hou, W Che - Ai Open, 2022 - Elsevier
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where
deep learning techniques may fail. It is widely applied in computer vision then introduced to …

A survey on data augmentation for text classification

M Bayer, MA Kaufhold, C Reuter - ACM Computing Surveys, 2022 - dl.acm.org
Data augmentation, the artificial creation of training data for machine learning by
transformations, is a widely studied research field across machine learning disciplines …

Auggpt: Leveraging chatgpt for text data augmentation

H Dai, Z Liu, W Liao, X Huang, Y Cao… - … Transactions on Big …, 2025 - ieeexplore.ieee.org
Text data augmentation is an effective strategy for overcoming the challenge of limited
sample sizes in many natural language processing (NLP) tasks. This challenge is especially …

User preference-aware fake news detection

Y Dou, K Shu, C **a, PS Yu, L Sun - … of the 44th international ACM SIGIR …, 2021 - dl.acm.org
Disinformation and fake news have posed detrimental effects on individuals and society in
recent years, attracting broad attention to fake news detection. The majority of existing fake …

Gpt3mix: Leveraging large-scale language models for text augmentation

KM Yoo, D Park, J Kang, SW Lee, W Park - arxiv preprint arxiv …, 2021 - arxiv.org
Large-scale language models such as GPT-3 are excellent few-shot learners, allowing them
to be controlled via natural text prompts. Recent studies report that prompt-based direct …

AEDA: an easier data augmentation technique for text classification

A Karimi, L Rossi, A Prati - arxiv preprint arxiv:2108.13230, 2021 - arxiv.org
This paper proposes AEDA (An Easier Data Augmentation) technique to help improve the
performance on text classification tasks. AEDA includes only random insertion of …

Increasing diversity while maintaining accuracy: Text data generation with large language models and human interventions

JJY Chung, E Kamar, S Amershi - arxiv preprint arxiv:2306.04140, 2023 - arxiv.org
Large language models (LLMs) can be used to generate text data for training and evaluating
other models. However, creating high-quality datasets with LLMs can be challenging. In this …

Waffling around for performance: Visual classification with random words and broad concepts

K Roth, JM Kim, A Koepke, O Vinyals… - Proceedings of the …, 2023 - openaccess.thecvf.com
The visual classification performance of vision-language models such as CLIP has been
shown to benefit from additional semantic knowledge from large language models (LLMs) …

STEMM: Self-learning with speech-text manifold mixup for speech translation

Q Fang, R Ye, L Li, Y Feng, M Wang - arxiv preprint arxiv:2203.10426, 2022 - arxiv.org
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …

Adamv-moe: Adaptive multi-task vision mixture-of-experts

T Chen, X Chen, X Du, A Rashwan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Sparsely activated Mixture-of-Experts (MoE) is becoming a promising paradigm for
multi-task learning (MTL). Instead of compressing multiple tasks' knowledge into a single …