A primer in BERTology: What we know about how BERT works

A Rogers, O Kovaleva, A Rumshisky - Transactions of the association …, 2021 - direct.mit.edu
Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …

Bertology meets biology: Interpreting attention in protein language models

J Vig, A Madani, LR Varshney, C **ong… - arxiv preprint arxiv …, 2020 - arxiv.org
Transformer architectures have proven to learn useful representations for protein
classification and generation tasks. However, these representations present challenges in …

What do they capture? a structural analysis of pre-trained language models for source code

Y Wan, W Zhao, H Zhang, Y Sui, G Xu… - Proceedings of the 44th …, 2022 - dl.acm.org
Recently, many pre-trained language models for source code have been proposed to model
the context of code and serve as a basis for downstream code intelligence tasks such as …

Probing pretrained language models for lexical semantics

I Vulić, EM Ponti, R Litschko, G Glavaš… - arxiv preprint arxiv …, 2020 - arxiv.org
The success of large pretrained language models (LMs) such as BERT and RoBERTa has
sparked interest in probing their representations, in order to unveil what types of knowledge …

From word types to tokens and back: A survey of approaches to word meaning representation and interpretation

M Apidianaki - Computational Linguistics, 2023 - direct.mit.edu
Vector-based word representation paradigms situate lexical meaning at different levels of
abstraction. Distributional and static embedding models generate a single vector per word …

A comparative evaluation and analysis of three generations of Distributional Semantic Models

A Lenci, M Sahlgren, P Jeuniaux… - Language resources …, 2022 - Springer
Distributional semantics has deeply changed in the last decades. First, predict models stole
the thunder from traditional count ones, and more recently both of them were replaced in …

Dynamic contextualized word embeddings

V Hofmann, JB Pierrehumbert, H Schütze - arxiv preprint arxiv …, 2020 - arxiv.org
Static word embeddings that represent words by a single vector cannot capture the
variability of word meaning in different linguistic and extralinguistic contexts. Building on …

Semantics of multiword expressions in transformer-based models: A survey

F Miletić, SS Walde - … of the Association for Computational Linguistics, 2024 - direct.mit.edu
Multiword expressions (MWEs) are composed of multiple words and exhibit variable
degrees of compositionality. As such, their meanings are notoriously difficult to model, and it …

MedRoBERTa. nl: a language model for Dutch electronic health records

S Verkijk, P Vossen - Computational Linguistics in the Netherlands, 2021 - research.vu.nl
This paper presents MedRoBERTa. nl as the first Transformer-based language model for
Dutch medical language. We show that using 13GB of text data from Dutch hospital notes …

Obtaining better static word embeddings using contextual embedding models

P Gupta, M Jaggi - arxiv preprint arxiv:2106.04302, 2021 - arxiv.org
The advent of contextual word embeddings--representations of words which incorporate
semantic and syntactic information from their context--has led to tremendous improvements …