[HTML][HTML] Data augmentation approaches in natural language processing: A survey

B Li, Y Hou, W Che - Ai Open, 2022 - Elsevier
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where
deep learning techniques may fail. It is widely applied in computer vision then introduced to …

Neural machine translation for low-resource languages: A survey

S Ranathunga, ESA Lee, M Prifti Skenduli… - ACM Computing …, 2023 - dl.acm.org
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since
the early 2000s and has already entered a mature phase. While considered the most widely …

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arxiv preprint arxiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

An analysis of simple data augmentation for named entity recognition

X Dai, H Adel - arxiv preprint arxiv:2010.11683, 2020 - arxiv.org
Simple yet effective data augmentation techniques have been proposed for sentence-level
and sentence-pair natural language processing tasks. Inspired by these efforts, we design …

Promda: Prompt-based data augmentation for low-resource nlu tasks

Y Wang, C Xu, Q Sun, H Hu, C Tao, X Geng… - arxiv preprint arxiv …, 2022 - arxiv.org
This paper focuses on the Data Augmentation for low-resource Natural Language
Understanding (NLU) tasks. We propose Prompt-based D} ata Augmentation model …

A review: preprocessing techniques and data augmentation for sentiment analysis

HT Duong, TA Nguyen-Thi - Computational Social Networks, 2021 - Springer
In literature, the machine learning-based studies of sentiment analysis are usually
supervised learning which must have pre-labeled datasets to be large enough in certain …

Kmmlu: Measuring massive multitask language understanding in korean

G Son, H Lee, S Kim, S Kim, N Muennighoff… - arxiv preprint arxiv …, 2024 - arxiv.org
We propose KMMLU, a new Korean benchmark with 35,030 expert-level multiple-choice
questions across 45 subjects ranging from humanities to STEM. While prior Korean …

A comparison of transformer and recurrent neural networks on multilingual neural machine translation

SM Lakew, M Cettolo, M Federico - arxiv preprint arxiv:1806.06957, 2018 - arxiv.org
Recently, neural machine translation (NMT) has been extended to multilinguality, that is to
handle more than one translation direction with a single system. Multilingual NMT showed …

Exploring new frontiers in agricultural nlp: Investigating the potential of large language models for food applications

S Rezayi, Z Liu, Z Wu, C Dhakal, B Ge… - … Transactions on Big …, 2024 - ieeexplore.ieee.org
This paper explores new frontiers in agricultural natural language processing (NLP) by
investigating the effectiveness of food-related text corpora for pretraining transformer-based …

[PDF][PDF] AgriBERT: Knowledge-Infused Agricultural Language Models for Matching Food and Nutrition.

S Rezayi, Z Liu, Z Wu, C Dhakal, B Ge, C Zhen, T Liu… - IJCAI, 2022 - researchgate.net
Pretraining domain-specific language models remains an important challenge which limits
their applicability in various areas such as agriculture. This paper investigates the …