Pre-trained models for natural language processing: A survey

X Qiu, T Sun, Y Xu, Y Shao, N Dai, X Huang - Science China …, 2020 - Springer
Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …

A survey on deep learning for named entity recognition

J Li, A Sun, J Han, C Li - IEEE transactions on knowledge and …, 2020 - ieeexplore.ieee.org
Named entity recognition (NER) is the task to identify mentions of rigid designators from text
belonging to predefined semantic types such as person, location, organization etc. NER …

Data2vec: A general framework for self-supervised learning in speech, vision and language

A Baevski, WN Hsu, Q Xu, A Babu… - … on Machine Learning, 2022 - proceedings.mlr.press
While the general idea of self-supervised learning is identical across modalities, the actual
algorithms and objectives differ widely because they were developed with a single modality …

XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

LUKE: Deep contextualized entity representations with entity-aware self-attention

I Yamada, A Asai, H Shindo, H Takeda… - arxiv preprint arxiv …, 2020 - arxiv.org
Entity representations are useful in natural language tasks involving entities. In this paper,
we propose new pretrained contextualized representations of words and entities based on …

Less training, more repairing please: revisiting automated program repair via zero-shot learning

CS **a, L Zhang - Proceedings of the 30th ACM Joint European …, 2022 - dl.acm.org
Due to the promising future of Automated Program Repair (APR), researchers have
proposed various APR techniques, including heuristic-based, template-based, and …

Don't stop pretraining: Adapt language models to domains and tasks

S Gururangan, A Marasović, S Swayamdipta… - arxiv preprint arxiv …, 2020 - arxiv.org
Language models pretrained on text from a wide variety of sources form the foundation of
today's NLP. In light of the success of these broad-coverage models, we investigate whether …

Efficient self-supervised learning with contextualized target representations for vision, speech and language

A Baevski, A Babu, WN Hsu… - … Conference on Machine …, 2023 - proceedings.mlr.press
Current self-supervised learning algorithms are often modality-specific and require large
amounts of computational resources. To address these issues, we increase the training …

A primer in BERTology: What we know about how BERT works

A Rogers, O Kovaleva, A Rumshisky - Transactions of the Association …, 2021 - direct.mit.edu
Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers

W Wang, F Wei, L Dong, H Bao… - Advances in Neural …, 2020 - proceedings.neurips.cc
Pre-trained language models (eg, BERT (Devlin et al., 2018) and its variants) have achieved
remarkable success in varieties of NLP tasks. However, these models usually consist of …