Continual learning for large language models: A survey

T Wu, L Luo, YF Li, S Pan, TT Vu, G Haffari - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are not amenable to frequent re-training, due to high
training costs arising from their massive scale. However, updates are necessary to endow …

Recent advances of foundation language models-based continual learning: A survey

Y Yang, J Zhou, X Ding, T Huai, S Liu, Q Chen… - ACM Computing …, 2025 - dl.acm.org
Recently, foundation language models (LMs) have marked significant achievements in the
domains of natural language processing and computer vision. Unlike traditional neural …

Ties-merging: Resolving interference when merging models

P Yadav, D Tam, L Choshen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Transfer learning–ie, further fine-tuning a pre-trained model on a downstream task–can
confer significant advantages, including improved downstream performance, faster …

Exploring parameter-efficient fine-tuning techniques for code generation with large language models

M Weyssow, X Zhou, K Kim, D Lo… - ACM Transactions on …, 2023 - dl.acm.org
Large language models (LLMs) demonstrate impressive capabilities to generate accurate
code snippets given natural language intents in a zero-shot manner, ie, without the need for …

A survey of large language models for code: Evolution, benchmarking, and future trends

Z Zheng, K Ning, Y Wang, J Zhang, D Zheng… - arxiv preprint arxiv …, 2023 - arxiv.org
General large language models (LLMs), represented by ChatGPT, have demonstrated
significant potential in tasks such as code generation in software engineering. This has led …

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

J Parmar, S Satheesh, M Patwary, M Shoeybi… - arxiv preprint arxiv …, 2024 - arxiv.org
As language models have scaled both their number of parameters and pretraining dataset
sizes, the computational cost for pretraining has become intractable except for the most well …

Continual learning with pre-trained models: A survey

DW Zhou, HL Sun, J Ning, HJ Ye, DC Zhan - arxiv preprint arxiv …, 2024 - arxiv.org
Nowadays, real-world applications often face streaming data, which requires the learning
system to absorb new knowledge as data evolves. Continual Learning (CL) aims to achieve …

Simple and scalable strategies to continually pre-train large language models

A Ibrahim, B Thérien, K Gupta, ML Richter… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start
the process over again once new data becomes available. A much more efficient solution is …

When to stop? towards efficient code generation in llms with excess token prevention

L Guo, Y Wang, E Shi, W Zhong, H Zhang… - Proceedings of the 33rd …, 2024 - dl.acm.org
Code generation aims to automatically generate code snippets that meet given natural
language requirements and plays an important role in software development. Although …

What Matters for Model Merging at Scale?

P Yadav, T Vu, J Lai, A Chronopoulou… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging aims to combine multiple expert models into a more capable single model,
offering benefits such as reduced storage and serving costs, improved generalization, and …