Ammus: A survey of transformer-based pretrained models in natural language processing
KS Kalyan, A Rajasekharan, S Sangeetha - arxiv preprint arxiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …
almost every NLP task. The evolution of these models started with GPT and BERT. These …
Pre-trained language models in biomedical domain: A systematic survey
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …
language processing tasks. This also benefits the biomedical domain: researchers from …
Memorization without overfitting: Analyzing the training dynamics of large language models
Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …
large language models is not well understood. We empirically study exact memorization in …
Do vision transformers see like convolutional neural networks?
Convolutional neural networks (CNNs) have so far been the de-facto model for visual data.
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or …
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or …
The neural architecture of language: Integrative modeling converges on predictive processing
The neuroscience of perception has recently been revolutionized with an integrative
modeling approach in which computation, brain function, and behavior are linked across …
modeling approach in which computation, brain function, and behavior are linked across …
Revisiting few-sample BERT fine-tuning
This paper is a study of fine-tuning of BERT contextual representations, with focus on
commonly observed instabilities in few-sample scenarios. We identify several factors that …
commonly observed instabilities in few-sample scenarios. We identify several factors that …
Achieving forgetting prevention and knowledge transfer in continual learning
Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving
two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge …
two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge …
On the effectiveness of adapter-based tuning for pretrained language model adaptation
Adapter-based tuning has recently arisen as an alternative to fine-tuning. It works by adding
light-weight adapter modules to a pretrained language model (PrLM) and only updating the …
light-weight adapter modules to a pretrained language model (PrLM) and only updating the …
All bark and no bite: Rogue dimensions in transformer language models obscure representational quality
Similarity measures are a vital tool for understanding how language models represent and
process language. Standard representational similarity measures such as cosine similarity …
process language. Standard representational similarity measures such as cosine similarity …
Semantic structure in deep learning
E Pavlick - Annual Review of Linguistics, 2022 - annualreviews.org
Deep learning has recently come to dominate computational linguistics, leading to claims of
human-level performance in a range of language processing tasks. Like much previous …
human-level performance in a range of language processing tasks. Like much previous …