Continual learning for large language models: A survey
Large language models (LLMs) are not amenable to frequent re-training, due to high
training costs arising from their massive scale. However, updates are necessary to endow …
training costs arising from their massive scale. However, updates are necessary to endow …
Recent advances of foundation language models-based continual learning: A survey
Recently, foundation language models (LMs) have marked significant achievements in the
domains of natural language processing and computer vision. Unlike traditional neural …
domains of natural language processing and computer vision. Unlike traditional neural …
Ties-merging: Resolving interference when merging models
Transfer learning–ie, further fine-tuning a pre-trained model on a downstream task–can
confer significant advantages, including improved downstream performance, faster …
confer significant advantages, including improved downstream performance, faster …
Exploring parameter-efficient fine-tuning techniques for code generation with large language models
Large language models (LLMs) demonstrate impressive capabilities to generate accurate
code snippets given natural language intents in a zero-shot manner, ie, without the need for …
code snippets given natural language intents in a zero-shot manner, ie, without the need for …
A survey of large language models for code: Evolution, benchmarking, and future trends
General large language models (LLMs), represented by ChatGPT, have demonstrated
significant potential in tasks such as code generation in software engineering. This has led …
significant potential in tasks such as code generation in software engineering. This has led …
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models
As language models have scaled both their number of parameters and pretraining dataset
sizes, the computational cost for pretraining has become intractable except for the most well …
sizes, the computational cost for pretraining has become intractable except for the most well …
Continual learning with pre-trained models: A survey
Nowadays, real-world applications often face streaming data, which requires the learning
system to absorb new knowledge as data evolves. Continual Learning (CL) aims to achieve …
system to absorb new knowledge as data evolves. Continual Learning (CL) aims to achieve …
Simple and scalable strategies to continually pre-train large language models
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start
the process over again once new data becomes available. A much more efficient solution is …
the process over again once new data becomes available. A much more efficient solution is …
When to stop? towards efficient code generation in llms with excess token prevention
Code generation aims to automatically generate code snippets that meet given natural
language requirements and plays an important role in software development. Although …
language requirements and plays an important role in software development. Although …
What Matters for Model Merging at Scale?
Model merging aims to combine multiple expert models into a more capable single model,
offering benefits such as reduced storage and serving costs, improved generalization, and …
offering benefits such as reduced storage and serving costs, improved generalization, and …