- Academic Search

H Shi, Z Xu, H Wang, W Qin, W Wang, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent success of large language models (LLMs) trained on static, pre-collected,
general datasets has sparked numerous research directions and applications. One such …

Save Cite Cited by 56 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Continual learning of natural language processing tasks: A survey

Z Ke, B Liu - arxiv preprint arxiv:2211.12701, 2022 - arxiv.org

Continual learning (CL) is a learning paradigm that emulates the human capability of
learning and accumulating knowledge continually without forgetting the previously learned …

Save Cite Cited by 86 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ernie-vilg 2.0: Improving text-to-image diffusion model with knowledge-enhanced mixture-of-denoising-experts

Z Feng, Z Zhang, X Yu, Y Fang, L Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent progress in diffusion models has revolutionized the popular technology of text-to-
image generation. While existing approaches could produce photorealistic high-resolution …

Save Cite Cited by 126 Related articles All 6 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arxiv preprint arxiv:2302.11529, 2023 - arxiv.org

Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

Save Cite Cited by 117 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Branch-train-merge: Embarrassingly parallel training of expert language models

M Li, S Gururangan, T Dettmers, M Lewis… - arxiv preprint arxiv …, 2022 - arxiv.org

We present Branch-Train-Merge (BTM), a communication-efficient algorithm for
embarrassingly parallel training of large language models (LLMs). We show it is possible to …

Save Cite Cited by 144 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Silo language models: Isolating legal risk in a nonparametric datastore

S Min, S Gururangan, E Wallace, W Shi… - arxiv preprint arxiv …, 2023 - arxiv.org

The legality of training language models (LMs) on copyrighted or otherwise restricted data is
under intense debate. However, as we show, model performance significantly degrades if …

Save Cite Cited by 62 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Large language models (LLMs): survey, technical frameworks, and future challenges

P Kumar - Artificial Intelligence Review, 2024 - Springer

Artificial intelligence (AI) has significantly impacted various fields. Large language models
(LLMs) like GPT-4, BARD, PaLM, Megatron-Turing NLG, Jurassic-1 Jumbo etc., have …

Save Cite Cited by 24 Related articles All 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Lifelong language pretraining with distribution-specialized experts

W Chen, Y Zhou, N Du, Y Huang… - International …, 2023 - proceedings.mlr.press

Pretraining on a large-scale corpus has become a standard method to build general
language models (LMs). Adapting a model to new data distributions targeting different …

Save Cite Cited by 41 Related articles All 6 versions Free GPT-4 DeepSeek View as HTML

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Save Cite Cited by 63 Related articles All 4 versions Free GPT-4 DeepSeek Cached

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lifelong pretraining: Continually adapting language models to emerging corpora

X **, D Zhang, H Zhu, W **ao, SW Li, X Wei… - arxiv preprint arxiv …, 2021 - arxiv.org

Pretrained language models (PTLMs) are typically learned over a large, static corpus and
further fine-tuned for various downstream tasks. However, when deployed in the real world …

Save Cite Cited by 121 Related articles All 8 versions Free GPT-4 DeepSeek View as HTML

Create alert

Cite

Advanced search

Saved to My library

Demix layers: Disentangling domains for modular language modeling

Continual learning of large language models: A comprehensive survey

Continual learning of natural language processing tasks: A survey

Ernie-vilg 2.0: Improving text-to-image diffusion model with knowledge-enhanced mixture-of-denoising-experts

Modular deep learning

Branch-train-merge: Embarrassingly parallel training of expert language models

Silo language models: Isolating legal risk in a nonparametric datastore

Large language models (LLMs): survey, technical frameworks, and future challenges

Lifelong language pretraining with distribution-specialized experts

A survey on mixture of experts

Lifelong pretraining: Continually adapting language models to emerging corpora