Towards lifelong learning of large language models: A survey

J Zheng, S Qiu, C Shi, Q Ma - ACM Computing Surveys, 2024 - dl.acm.org
As the applications of large language models (LLMs) expand across diverse fields, their
ability to adapt to ongoing changes in data, tasks, and user preferences becomes crucial …

Authorship attribution in the era of llms: Problems, methodologies, and challenges

B Huang, C Chen, K Shu - ACM SIGKDD Explorations Newsletter, 2025 - dl.acm.org
Accurate attribution of authorship is crucial for maintaining the integrity of digital content,
improving forensic investigations, and mitigating the risks of misinformation and plagiarism …

Chatqa: Surpassing gpt-4 on conversational qa and rag

Z Liu, W **, R Roy, P Xu, C Lee… - Advances in …, 2025 - proceedings.neurips.cc
In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-
augmented generation (RAG) and conversational question answering (QA). To enhance …

Datacomp-lm: In search of the next generation of training sets for language models

J Li, A Fang, G Smyrnis, M Ivgi, M Jordan… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset
experiments with the goal of improving language models. As part of DCLM, we provide a …

A survey of multimodal large language model from a data-centric perspective

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …

[PDF][PDF] Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence

B Peng, D Goldstein, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2024 - openreview.net
Abstract We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving
upon the RWKV (RWKV-4)(Peng et al., 2023) architecture. Our architectural design …

Language models scale reliably with over-training and on downstream tasks

SY Gadre, G Smyrnis, V Shankar, S Gururangan… - arxiv preprint arxiv …, 2024 - arxiv.org
Scaling laws are useful guides for derisking expensive training runs, as they predict
performance of large models using cheaper, small-scale experiments. However, there …

Instruction pre-training: Language models are supervised multitask learners

D Cheng, Y Gu, S Huang, J Bi, M Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
Unsupervised multitask pre-training has been the critical method behind the recent success
of language models (LMs). However, supervised multitask learning still holds significant …

From generation to judgment: Opportunities and challenges of llm-as-a-judge

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

Scaling laws for precision

T Kumar, Z Ankner, BF Spector, B Bordelon… - arxiv preprint arxiv …, 2024 - arxiv.org
Low precision training and inference affect both the quality and cost of language models, but
current scaling laws do not account for this. In this work, we devise" precision-aware" scaling …