Neural machine translation: A review

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

[PDF][PDF] Language models are unsupervised multitask learners

A Radford, J Wu, R Child, D Luan… - OpenAI …, 2019 - insightcivic.s3.us-east-1.amazonaws …
Natural language processing tasks, such as question answering, machine translation,
reading comprehension, and summarization, are typically approached with supervised …

Adversarial NLI: A new benchmark for natural language understanding

Y Nie, A Williams, E Dinan, M Bansal, J Weston… - arxiv preprint arxiv …, 2019 - arxiv.org
We introduce a new large-scale NLI benchmark dataset, collected via an iterative,
adversarial human-and-model-in-the-loop procedure. We show that training models on this …

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams… - arxiv preprint arxiv …, 2021 - arxiv.org
A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …

Sparse, dense, and attentional representations for text retrieval

Y Luan, J Eisenstein, K Toutanova… - Transactions of the …, 2021 - direct.mit.edu
Dual encoders perform retrieval by encoding documents and queries into dense low-
dimensional vectors, scoring each document by its inner product with the query. We …

Information-theoretic probing with minimum description length

E Voita, I Titov - arxiv preprint arxiv:2003.12298, 2020 - arxiv.org
To measure how well pretrained representations encode some linguistic property, it is
common to use accuracy of a probe, ie a classifier trained to predict the property from the …

Backdoor learning for nlp: Recent advances, challenges, and future research directions

M Omar - arxiv preprint arxiv:2302.06801, 2023 - arxiv.org
Although backdoor learning is an active research topic in the NLP domain, the literature
lacks studies that systematically categorize and summarize backdoor attacks and defenses …

Pre-training via paraphrasing

M Lewis, M Ghazvininejad, G Ghosh… - Advances in …, 2020 - proceedings.neurips.cc
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an
unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an …

Probing the probing paradigm: Does probing accuracy entail task relevance?

A Ravichander, Y Belinkov, E Hovy - arxiv preprint arxiv:2005.00719, 2020 - arxiv.org
Although neural models have achieved impressive results on several NLP benchmarks, little
is understood about the mechanisms they use to perform language tasks. Thus, much recent …

Do attention heads in BERT track syntactic dependencies?

PM Htut, J Phang, S Bordia, SR Bowman - arxiv preprint arxiv:1911.12246, 2019 - arxiv.org
We investigate the extent to which individual attention heads in pretrained transformer
language models, such as BERT and RoBERTa, implicitly capture syntactic dependency …