Continual learning of large language models: A comprehensive survey

H Shi, Z Xu, H Wang, W Qin, W Wang, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent success of large language models (LLMs) trained on static, pre-collected,
general datasets has sparked numerous research directions and applications. One such …

Continual learning of natural language processing tasks: A survey

Z Ke, B Liu - arxiv preprint arxiv:2211.12701, 2022 - arxiv.org
Continual learning (CL) is a learning paradigm that emulates the human capability of
learning and accumulating knowledge continually without forgetting the previously learned …

Ernie-vilg 2.0: Improving text-to-image diffusion model with knowledge-enhanced mixture-of-denoising-experts

Z Feng, Z Zhang, X Yu, Y Fang, L Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent progress in diffusion models has revolutionized the popular technology of text-to-
image generation. While existing approaches could produce photorealistic high-resolution …

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arxiv preprint arxiv:2302.11529, 2023 - arxiv.org
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

Branch-train-merge: Embarrassingly parallel training of expert language models

M Li, S Gururangan, T Dettmers, M Lewis… - arxiv preprint arxiv …, 2022 - arxiv.org
We present Branch-Train-Merge (BTM), a communication-efficient algorithm for
embarrassingly parallel training of large language models (LLMs). We show it is possible to …

Silo language models: Isolating legal risk in a nonparametric datastore

S Min, S Gururangan, E Wallace, W Shi… - arxiv preprint arxiv …, 2023 - arxiv.org
The legality of training language models (LMs) on copyrighted or otherwise restricted data is
under intense debate. However, as we show, model performance significantly degrades if …

Large language models (LLMs): survey, technical frameworks, and future challenges

P Kumar - Artificial Intelligence Review, 2024 - Springer
Artificial intelligence (AI) has significantly impacted various fields. Large language models
(LLMs) like GPT-4, BARD, PaLM, Megatron-Turing NLG, Jurassic-1 Jumbo etc., have …

Lifelong language pretraining with distribution-specialized experts

W Chen, Y Zhou, N Du, Y Huang… - International …, 2023 - proceedings.mlr.press
Pretraining on a large-scale corpus has become a standard method to build general
language models (LMs). Adapting a model to new data distributions targeting different …

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Lifelong pretraining: Continually adapting language models to emerging corpora

X **, D Zhang, H Zhu, W **ao, SW Li, X Wei… - arxiv preprint arxiv …, 2021 - arxiv.org
Pretrained language models (PTLMs) are typically learned over a large, static corpus and
further fine-tuned for various downstream tasks. However, when deployed in the real world …