A survey of active learning for natural language processing
In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …
language processing (NLP). In addition to a fine-grained categorization of query strategies …
Unsupervised domain clusters in pretrained language models
The notion of" in-domain data" in NLP is often over-simplistic and vague, as textual data
varies in many nuanced linguistic aspects such as topic, style or level of formality. In …
varies in many nuanced linguistic aspects such as topic, style or level of formality. In …
Competence-based curriculum learning for neural machine translation
Current state-of-the-art NMT systems use large neural networks that are not only slow to
train, but also often require many heuristics and optimization tricks, such as specialized …
train, but also often require many heuristics and optimization tricks, such as specialized …
A review of bangla natural language processing tasks and the utility of transformer models
Bangla--ranked as the 6th most widely spoken language across the world (https://www.
ethnologue. com/guides/ethnologue200), with 230 million native speakers--is still …
ethnologue. com/guides/ethnologue200), with 230 million native speakers--is still …
Active learning and crowdsourcing for machine translation in low resource scenarios
V Ambati - 2012 - search.proquest.com
Corpus based approaches to automatic translation such as Example Based and Statistical
Machine Translation systems use large amounts of parallel data created by humans to train …
Machine Translation systems use large amounts of parallel data created by humans to train …
Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation
Despite being the seventh most widely spoken language in the world, Bengali has received
much less attention in machine translation literature due to being low in resources. Most …
much less attention in machine translation literature due to being low in resources. Most …
[PDF][PDF] Submodularity for data selection in machine translation
We introduce submodular optimization to the problem of training data subset selection for
statistical machine translation (SMT). By explicitly formulating data selection as a …
statistical machine translation (SMT). By explicitly formulating data selection as a …
Which examples to annotate for in-context learning? towards effective and efficient selection
Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is
efficient as it does not require any parameter updates to the trained LLM, but only few …
efficient as it does not require any parameter updates to the trained LLM, but only few …
Active learning for abstractive text summarization
Construction of human-curated annotated datasets for abstractive text summarization (ATS)
is very time-consuming and expensive because creating each instance requires a human …
is very time-consuming and expensive because creating each instance requires a human …
Active learning for interactive neural machine translation of data streams
We study the application of active learning techniques to the translation of unbounded data
streams via interactive neural machine translation. The main idea is to select, from an …
streams via interactive neural machine translation. The main idea is to select, from an …