Survey: Image mixing and deleting for data augmentation

H Naveed, S Anwar, M Hayat, K Javed… - Engineering applications of …, 2024 - Elsevier
Neural networks are prone to overfitting and memorizing data patterns. To avoid over-fitting
and enhance their generalization and performance, various methods have been suggested …

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arxiv preprint arxiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

A survey of mix-based data augmentation: Taxonomy, methods, applications, and explainability

C Cao, F Zhou, Y Dai, J Wang, K Zhang - ACM Computing Surveys, 2024 - dl.acm.org
Data augmentation (DA) is indispensable in modern machine learning and deep neural
networks. The basic idea of DA is to construct new training data to improve the model's …

A survey of active learning for natural language processing

Z Zhang, E Strubell, E Hovy - arxiv preprint arxiv:2210.10109, 2022 - arxiv.org
In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …

An empirical survey of data augmentation for limited data learning in NLP

J Chen, D Tam, C Raffel, M Bansal… - Transactions of the …, 2023 - direct.mit.edu
NLP has achieved great progress in the past decade through the use of neural models and
large labeled datasets. The dependence on abundant data prevents NLP models from being …

C-mixup: Improving generalization in regression

H Yao, Y Wang, L Zhang, JY Zou… - Advances in neural …, 2022 - proceedings.neurips.cc
Improving the generalization of deep networks is an important open challenge, particularly
in domains without plentiful data. The mixup algorithm improves generalization by linearly …

Prboost: Prompt-based rule discovery and boosting for interactive weakly-supervised learning

R Zhang, Y Yu, P Shetty, L Song, C Zhang - arxiv preprint arxiv …, 2022 - arxiv.org
Weakly-supervised learning (WSL) has shown promising results in addressing label scarcity
on many NLP tasks, but manually designing a comprehensive, high-quality labeling rule set …

Towards domain-agnostic contrastive learning

V Verma, T Luong, K Kawaguchi… - … on Machine Learning, 2021 - proceedings.mlr.press
Despite recent successes, most contrastive self-supervised learning methods are domain-
specific, relying heavily on data augmentation techniques that require knowledge about a …

Cold-start data selection for few-shot language model fine-tuning: A prompt-based uncertainty propagation approach

Y Yu, R Zhang, R Xu, J Zhang, J Shen… - arxiv preprint arxiv …, 2022 - arxiv.org
Large Language Models have demonstrated remarkable few-shot performance, but the
performance can be sensitive to the selection of few-shot instances. We propose PATRON, a …

Denoising multi-source weak supervision for neural text classification

W Ren, Y Li, H Su, D Kartchner, C Mitchell… - arxiv preprint arxiv …, 2020 - arxiv.org
We study the problem of learning neural text classifiers without using any labeled data, but
only easy-to-provide rules as multiple weak supervision sources. This problem is …