BERTweet: A pre-trained language model for English Tweets
We present BERTweet, the first public large-scale pre-trained language model for English
Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is …
Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is …
[PDF][PDF] BERT rediscovers the classical NLP pipeline
I Tenney - arxiv preprint arxiv:1905.05950, 2019 - fq.pkwyx.com
Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We
focus on one such model, BERT, and aim to quantify where linguistic information is captured …
focus on one such model, BERT, and aim to quantify where linguistic information is captured …
Masked language modeling and the distributional hypothesis: Order word matters pre-training for little
A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …
pre-training is that such models have learned to represent the syntactic structures prevalent …
[PDF][PDF] Linguistic Knowledge and Transferability of Contextual Representations
NF Liu - arxiv preprint arxiv:1903.08855, 2019 - fq.pkwyx.com
Contextual word representations derived from large-scale neural language models are
successful across a diverse set of NLP tasks, suggesting that they encode useful and …
successful across a diverse set of NLP tasks, suggesting that they encode useful and …
Evaluating models' local decision boundaries via contrast sets
Standard test sets for supervised learning evaluate in-distribution generalization.
Unfortunately, when a dataset has systematic gaps (eg, annotation artifacts), these …
Unfortunately, when a dataset has systematic gaps (eg, annotation artifacts), these …
What do you learn from context? probing for sentence structure in contextualized word representations
Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT
(Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of …
(Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of …
Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work?
While pretrained models such as BERT have shown large gains across natural language
understanding tasks, their performance can be improved by further training the model on a …
understanding tasks, their performance can be improved by further training the model on a …
A resource-rational model of human processing of recursive linguistic structure
A major goal of psycholinguistic theory is to account for the cognitive constraints limiting the
speed and ease of language comprehension and production. Wide-ranging evidence …
speed and ease of language comprehension and production. Wide-ranging evidence …
Automatic mining of opinions expressed about apis in stack overflow
With the proliferation of online developer forums, developers share their opinions about the
APIs they use. The plethora of such information can present challenges to the developers to …
APIs they use. The plethora of such information can present challenges to the developers to …
When do you need billions of words of pretraining data?
NLP is currently dominated by general-purpose pretrained language models like RoBERTa,
which achieve strong performance on NLU tasks through pretraining on billions of words …
which achieve strong performance on NLU tasks through pretraining on billions of words …