From generation to judgment: Opportunities and challenges of llm-as-a-judge
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …
and natural language processing (NLP). However, traditional methods, whether matching …
Authorship attribution in the era of llms: Problems, methodologies, and challenges
Accurate attribution of authorship is crucial for maintaining the integrity of digital content,
improving forensic investigations, and mitigating the risks of misinformation and plagiarism …
improving forensic investigations, and mitigating the risks of misinformation and plagiarism …
Datacomp-lm: In search of the next generation of training sets for language models
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset
experiments with the goal of improving language models. As part of DCLM, we provide a …
experiments with the goal of improving language models. As part of DCLM, we provide a …
A survey of multimodal large language model from a data-centric perspective
Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …
language models by integrating and processing data from multiple modalities, including text …
Language models scale reliably with over-training and on downstream tasks
Scaling laws are useful guides for derisking expensive training runs, as they predict
performance of large models using cheaper, small-scale experiments. However, there …
performance of large models using cheaper, small-scale experiments. However, there …
Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence
We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the
RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed …
RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed …
Chatqa: Surpassing gpt-4 on conversational qa and rag
In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-
augmented generation (RAG) and conversational question answering (QA). To enhance …
augmented generation (RAG) and conversational question answering (QA). To enhance …
Scaling laws for precision
Low precision training and inference affect both the quality and cost of language models, but
current scaling laws do not account for this. In this work, we devise" precision-aware" scaling …
current scaling laws do not account for this. In this work, we devise" precision-aware" scaling …
Entropy law: The story behind data compression and llm performance
Data is the cornerstone of large language models (LLMs), but not all data is useful for model
learning. Carefully selected data can better elicit the capabilities of LLMs with much less …
learning. Carefully selected data can better elicit the capabilities of LLMs with much less …
Training on the test task confounds evaluation and emergence
We study a fundamental problem in the evaluation of large language models that we call
training on the test task. Unlike wrongful practices like training on the test data, leakage, or …
training on the test task. Unlike wrongful practices like training on the test data, leakage, or …