Recent advances in natural language processing via large pre-trained language models: A survey

B Min, H Ross, E Sulem, APB Veyseh… - ACM Computing …, 2023 - dl.acm.org
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …

Continual lifelong learning in natural language processing: A survey

M Biesialska, K Biesialska, MR Costa-Jussa - arxiv preprint arxiv …, 2020 - arxiv.org
Continual learning (CL) aims to enable information systems to learn from a continuous data
stream across time. However, it is difficult for existing deep learning architectures to learn a …

Adaptformer: Adapting vision transformers for scalable visual recognition

S Chen, C Ge, Z Tong, J Wang… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …

Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning

H Liu, D Tam, M Muqeeth, J Mohta… - Advances in …, 2022 - proceedings.neurips.cc
Few-shot in-context learning (ICL) enables pre-trained language models to perform a
previously-unseen task without any gradient-based training by feeding a small number of …

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022 - proceedings.neurips.cc
Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …

Beyond english-centric multilingual machine translation

A Fan, S Bhosale, H Schwenk, Z Ma, A El-Kishky… - Journal of Machine …, 2021 - jmlr.org
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …

Adapterhub: A framework for adapting transformers

J Pfeiffer, A Rücklé, C Poth, A Kamath, I Vulić… - arxiv preprint arxiv …, 2020 - arxiv.org
The current modus operandi in NLP involves downloading and fine-tuning pre-trained
models consisting of millions or billions of parameters. Storing and sharing such large …

Adapterfusion: Non-destructive task composition for transfer learning

J Pfeiffer, A Kamath, A Rücklé, K Cho… - arxiv preprint arxiv …, 2020 - arxiv.org
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge
from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in …

Training neural networks with fixed sparse masks

YL Sung, V Nair, CA Raffel - Advances in Neural …, 2021 - proceedings.neurips.cc
During typical gradient-based training of deep neural networks, all of the model's
parameters are updated at each iteration. Recent work has shown that it is possible to …

Exploring the limits of transfer learning with a unified text-to-text transformer

C Raffel, N Shazeer, A Roberts, K Lee, S Narang… - Journal of machine …, 2020 - jmlr.org
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-
tuned on a downstream task, has emerged as a powerful technique in natural language …