A survey of active learning for natural language processing

Z Zhang, E Strubell, E Hovy - arxiv preprint arxiv:2210.10109, 2022 - arxiv.org
In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …

Unsupervised domain clusters in pretrained language models

R Aharoni, Y Goldberg - arxiv preprint arxiv:2004.02105, 2020 - arxiv.org
The notion of" in-domain data" in NLP is often over-simplistic and vague, as textual data
varies in many nuanced linguistic aspects such as topic, style or level of formality. In …

Competence-based curriculum learning for neural machine translation

EA Platanios, O Stretcu, G Neubig, B Poczos… - arxiv preprint arxiv …, 2019 - arxiv.org
Current state-of-the-art NMT systems use large neural networks that are not only slow to
train, but also often require many heuristics and optimization tricks, such as specialized …

A review of bangla natural language processing tasks and the utility of transformer models

F Alam, A Hasan, T Alam, A Khan, J Tajrin… - arxiv preprint arxiv …, 2021 - arxiv.org
Bangla--ranked as the 6th most widely spoken language across the world (https://www.
ethnologue. com/guides/ethnologue200), with 230 million native speakers--is still …

Active learning and crowdsourcing for machine translation in low resource scenarios

V Ambati - 2012 - search.proquest.com
Corpus based approaches to automatic translation such as Example Based and Statistical
Machine Translation systems use large amounts of parallel data created by humans to train …

Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation

T Hasan, A Bhattacharjee, K Samin, M Hasan… - arxiv preprint arxiv …, 2020 - arxiv.org
Despite being the seventh most widely spoken language in the world, Bengali has received
much less attention in machine translation literature due to being low in resources. Most …

[PDF][PDF] Submodularity for data selection in machine translation

K Kirchhoff, J Bilmes - Proceedings of the 2014 Conference on …, 2014 - aclanthology.org
We introduce submodular optimization to the problem of training data subset selection for
statistical machine translation (SMT). By explicitly formulating data selection as a …

Which examples to annotate for in-context learning? towards effective and efficient selection

C Mavromatis, B Srinivasan, Z Shen, J Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is
efficient as it does not require any parameter updates to the trained LLM, but only few …

Active learning for abstractive text summarization

A Tsvigun, I Lysenko, D Sedashov, I Lazichny… - arxiv preprint arxiv …, 2023 - arxiv.org
Construction of human-curated annotated datasets for abstractive text summarization (ATS)
is very time-consuming and expensive because creating each instance requires a human …

Active learning for interactive neural machine translation of data streams

Á Peris, F Casacuberta - arxiv preprint arxiv:1807.11243, 2018 - arxiv.org
We study the application of active learning techniques to the translation of unbounded data
streams via interactive neural machine translation. The main idea is to select, from an …