Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arxiv preprint arxiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

Pre-trained language models in biomedical domain: A systematic survey

B Wang, Q **e, J Pei, Z Chen, P Tiwari, Z Li… - ACM Computing …, 2023 - dl.acm.org
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …

Memorization without overfitting: Analyzing the training dynamics of large language models

K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc
Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …

Do vision transformers see like convolutional neural networks?

M Raghu, T Unterthiner, S Kornblith… - Advances in neural …, 2021 - proceedings.neurips.cc
Convolutional neural networks (CNNs) have so far been the de-facto model for visual data.
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or …

The neural architecture of language: Integrative modeling converges on predictive processing

M Schrimpf, IA Blank, G Tuckute… - Proceedings of the …, 2021 - National Acad Sciences
The neuroscience of perception has recently been revolutionized with an integrative
modeling approach in which computation, brain function, and behavior are linked across …

Revisiting few-sample BERT fine-tuning

T Zhang, F Wu, A Katiyar, KQ Weinberger… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper is a study of fine-tuning of BERT contextual representations, with focus on
commonly observed instabilities in few-sample scenarios. We identify several factors that …

Achieving forgetting prevention and knowledge transfer in continual learning

Z Ke, B Liu, N Ma, H Xu, L Shu - Advances in Neural …, 2021 - proceedings.neurips.cc
Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving
two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge …

On the effectiveness of adapter-based tuning for pretrained language model adaptation

R He, L Liu, H Ye, Q Tan, B Ding, L Cheng… - arxiv preprint arxiv …, 2021 - arxiv.org
Adapter-based tuning has recently arisen as an alternative to fine-tuning. It works by adding
light-weight adapter modules to a pretrained language model (PrLM) and only updating the …

All bark and no bite: Rogue dimensions in transformer language models obscure representational quality

W Timkey, M Van Schijndel - arxiv preprint arxiv:2109.04404, 2021 - arxiv.org
Similarity measures are a vital tool for understanding how language models represent and
process language. Standard representational similarity measures such as cosine similarity …

Semantic structure in deep learning

E Pavlick - Annual Review of Linguistics, 2022 - annualreviews.org
Deep learning has recently come to dominate computational linguistics, leading to claims of
human-level performance in a range of language processing tasks. Like much previous …