BERTweet: A pre-trained language model for English Tweets

DQ Nguyen, T Vu, AT Nguyen - arxiv preprint arxiv:2005.10200, 2020 - arxiv.org
We present BERTweet, the first public large-scale pre-trained language model for English
Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is …

[PDF][PDF] BERT rediscovers the classical NLP pipeline

I Tenney - arxiv preprint arxiv:1905.05950, 2019 - fq.pkwyx.com
Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We
focus on one such model, BERT, and aim to quantify where linguistic information is captured …

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams… - arxiv preprint arxiv …, 2021 - arxiv.org
A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …

[PDF][PDF] Linguistic Knowledge and Transferability of Contextual Representations

NF Liu - arxiv preprint arxiv:1903.08855, 2019 - fq.pkwyx.com
Contextual word representations derived from large-scale neural language models are
successful across a diverse set of NLP tasks, suggesting that they encode useful and …

Evaluating models' local decision boundaries via contrast sets

M Gardner, Y Artzi, V Basmova, J Berant… - arxiv preprint arxiv …, 2020 - arxiv.org
Standard test sets for supervised learning evaluate in-distribution generalization.
Unfortunately, when a dataset has systematic gaps (eg, annotation artifacts), these …

What do you learn from context? probing for sentence structure in contextualized word representations

I Tenney, P **a, B Chen, A Wang, A Poliak… - arxiv preprint arxiv …, 2019 - arxiv.org
Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT
(Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of …

Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work?

Y Pruksachatkun, J Phang, H Liu, PM Htut… - arxiv preprint arxiv …, 2020 - arxiv.org
While pretrained models such as BERT have shown large gains across natural language
understanding tasks, their performance can be improved by further training the model on a …

A resource-rational model of human processing of recursive linguistic structure

M Hahn, R Futrell, R Levy… - Proceedings of the …, 2022 - National Acad Sciences
A major goal of psycholinguistic theory is to account for the cognitive constraints limiting the
speed and ease of language comprehension and production. Wide-ranging evidence …

Automatic mining of opinions expressed about apis in stack overflow

G Uddin, F Khomh - IEEE Transactions on Software …, 2019 - ieeexplore.ieee.org
With the proliferation of online developer forums, developers share their opinions about the
APIs they use. The plethora of such information can present challenges to the developers to …

When do you need billions of words of pretraining data?

Y Zhang, A Warstadt, HS Li, SR Bowman - arxiv preprint arxiv:2011.04946, 2020 - arxiv.org
NLP is currently dominated by general-purpose pretrained language models like RoBERTa,
which achieve strong performance on NLU tasks through pretraining on billions of words …