A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities

Y Song, T Wang, P Cai, SK Mondal… - ACM Computing Surveys, 2023 - dl.acm.org
Few-shot learning (FSL) has emerged as an effective learning method and shows great
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …

[HTML][HTML] Data augmentation approaches in natural language processing: A survey

B Li, Y Hou, W Che - Ai Open, 2022 - Elsevier
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where
deep learning techniques may fail. It is widely applied in computer vision then introduced to …

Increasing diversity while maintaining accuracy: Text data generation with large language models and human interventions

JJY Chung, E Kamar, S Amershi - arxiv preprint arxiv:2306.04140, 2023 - arxiv.org
Large language models (LLMs) can be used to generate text data for training and evaluating
other models. However, creating high-quality datasets with LLMs can be challenging. In this …

Label-specific feature augmentation for long-tailed multi-label text classification

P Xu, L **ao, B Liu, S Lu, L **g, J Yu - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Multi-label text classification (MLTC) involves tagging a document with its most relevant
subset of labels from a label set. In real applications, labels usually follow a long-tailed …

Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning

P Chen, J Wang, H Lin, D Zhao, Z Yang - Bioinformatics, 2023 - academic.oup.com
Motivation Few-shot learning that can effectively perform named entity recognition in low-
resource scenarios has raised growing attention, but it has not been widely studied yet in the …

A survey on data augmentation in large model era

Y Zhou, C Guo, X Wang, Y Chang, Y Wu - arxiv preprint arxiv:2401.15422, 2024 - arxiv.org
Large models, encompassing large language and diffusion models, have shown
exceptional promise in approximating human-level intelligence, garnering significant …

Tabular and latent space synthetic data generation: a literature review

J Fonseca, F Bacao - Journal of Big Data, 2023 - Springer
The generation of synthetic data can be used for anonymization, regularization,
oversampling, semi-supervised learning, self-supervised learning, and several other tasks …

Genius: Sketch-based language model pre-training via extreme and selective masking for text generation and augmentation

B Guo, Y Gong, Y Shen, S Han, H Huang… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce GENIUS: a conditional text generation model using sketches as input, which
can fill in the missing contexts for a given sketch (key information consisting of textual spans …

Dale: Generative data augmentation for low-resource legal nlp

S Ghosh, CK Evuru, S Kumar… - arxiv preprint arxiv …, 2023 - arxiv.org
We present DALE, a novel and effective generative Data Augmentation framework for low-
resource LEgal NLP. DALE addresses the challenges existing frameworks pose in …

FewNLU: Benchmarking state-of-the-art methods for few-shot natural language understanding

Y Zheng, J Zhou, Y Qian, M Ding, C Liao, J Li… - arxiv preprint arxiv …, 2021 - arxiv.org
The few-shot natural language understanding (NLU) task has attracted much recent
attention. However, prior methods have been evaluated under a disparate set of protocols …