- Academic Search

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org

The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

Save Cite Cited by 460 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] amazonaws.com

[PDF][PDF] Language models are unsupervised multitask learners

A Radford, J Wu, R Child, D Luan… - OpenAI …, 2019 - insightcivic.s3.us-east-1.amazonaws …

Natural language processing tasks, such as question answering, machine translation,
reading comprehension, and summarization, are typically approached with supervised …

Save Cite Cited by 15659 Related articles All 31 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Adversarial NLI: A new benchmark for natural language understanding

Y Nie, A Williams, E Dinan, M Bansal, J Weston… - arxiv preprint arxiv …, 2019 - arxiv.org

We introduce a new large-scale NLI benchmark dataset, collected via an iterative,
adversarial human-and-model-in-the-loop procedure. We show that training models on this …

Save Cite Cited by 1010 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams… - arxiv preprint arxiv …, 2021 - arxiv.org

A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …

Save Cite Cited by 263 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mit.edu

Sparse, dense, and attentional representations for text retrieval

Y Luan, J Eisenstein, K Toutanova… - Transactions of the …, 2021 - direct.mit.edu

Dual encoders perform retrieval by encoding documents and queries into dense low-
dimensional vectors, scoring each document by its inner product with the query. We …

Save Cite Cited by 426 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Information-theoretic probing with minimum description length

E Voita, I Titov - arxiv preprint arxiv:2003.12298, 2020 - arxiv.org

To measure how well pretrained representations encode some linguistic property, it is
common to use accuracy of a probe, ie a classifier trained to predict the property from the …

Save Cite Cited by 282 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Backdoor learning for nlp: Recent advances, challenges, and future research directions

M Omar - arxiv preprint arxiv:2302.06801, 2023 - arxiv.org

Although backdoor learning is an active research topic in the NLP domain, the literature
lacks studies that systematically categorize and summarize backdoor attacks and defenses …

Save Cite Cited by 21 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Pre-training via paraphrasing

M Lewis, M Ghazvininejad, G Ghosh… - Advances in …, 2020 - proceedings.neurips.cc

We introduce MARGE, a pre-trained sequence-to-sequence model learned with an
unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an …

Save Cite Cited by 170 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Probing the probing paradigm: Does probing accuracy entail task relevance?

A Ravichander, Y Belinkov, E Hovy - arxiv preprint arxiv:2005.00719, 2020 - arxiv.org

Although neural models have achieved impressive results on several NLP benchmarks, little
is understood about the mechanisms they use to perform language tasks. Thus, much recent …

Save Cite Cited by 130 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Do attention heads in BERT track syntactic dependencies?

PM Htut, J Phang, S Bordia, SR Bowman - arxiv preprint arxiv:1911.12246, 2019 - arxiv.org

We investigate the extent to which individual attention heads in pretrained transformer
language models, such as BERT and RoBERTa, implicitly capture syntactic dependency …

Save Cite Cited by 152 Related articles All 2 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

No training required: Exploring random encoders for sentence classification

Neural machine translation: A review

[PDF][PDF] Language models are unsupervised multitask learners

Adversarial NLI: A new benchmark for natural language understanding

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

Sparse, dense, and attentional representations for text retrieval

Information-theoretic probing with minimum description length

Backdoor learning for nlp: Recent advances, challenges, and future research directions

Pre-training via paraphrasing

Probing the probing paradigm: Does probing accuracy entail task relevance?

Do attention heads in BERT track syntactic dependencies?