Recent advances in natural language processing via large pre-trained language models: A survey
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …
Continual lifelong learning in natural language processing: A survey
Continual learning (CL) aims to enable information systems to learn from a continuous data
stream across time. However, it is difficult for existing deep learning architectures to learn a …
stream across time. However, it is difficult for existing deep learning architectures to learn a …
Adaptformer: Adapting vision transformers for scalable visual recognition
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …
recognition. A following scenario is to adapt a ViT to various image and video recognition …
Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning
Few-shot in-context learning (ICL) enables pre-trained language models to perform a
previously-unseen task without any gradient-based training by feeding a small number of …
previously-unseen task without any gradient-based training by feeding a small number of …
St-adapter: Parameter-efficient image-to-video transfer learning
Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …
recently emerged with promising performance. Due to the ever-growing model size, the …
Beyond english-centric multilingual machine translation
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …
translation by training a single model able to translate between any pair of languages …
Adapterhub: A framework for adapting transformers
The current modus operandi in NLP involves downloading and fine-tuning pre-trained
models consisting of millions or billions of parameters. Storing and sharing such large …
models consisting of millions or billions of parameters. Storing and sharing such large …
Adapterfusion: Non-destructive task composition for transfer learning
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge
from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in …
from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in …
Training neural networks with fixed sparse masks
During typical gradient-based training of deep neural networks, all of the model's
parameters are updated at each iteration. Recent work has shown that it is possible to …
parameters are updated at each iteration. Recent work has shown that it is possible to …
Exploring the limits of transfer learning with a unified text-to-text transformer
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-
tuned on a downstream task, has emerged as a powerful technique in natural language …
tuned on a downstream task, has emerged as a powerful technique in natural language …