The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Llamafactory: Unified efficient fine-tuning of 100+ language models

Y Zheng, R Zhang, J Zhang, Y Ye, Z Luo… - arxiv preprint arxiv …, 2024 - arxiv.org
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …

From crowdsourced data to high-quality benchmarks: Arena-hard and benchbuilder pipeline

T Li, WL Chiang, E Frick, L Dunlap, T Wu, B Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid evolution of Large Language Models (LLMs) has outpaced the development of
model evaluation, highlighting the need for continuous curation of new, challenging …

Magpie: Alignment data synthesis from scratch by prompting aligned llms with nothing

Z Xu, F Jiang, L Niu, Y Deng, R Poovendran… - arxiv preprint arxiv …, 2024 - arxiv.org
High-quality instruction data is critical for aligning large language models (LLMs). Although
some models, such as Llama-3-Instruct, have open weights, their alignment data remain …

Simple and scalable strategies to continually pre-train large language models

A Ibrahim, B Thérien, K Gupta, ML Richter… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start
the process over again once new data becomes available. A much more efficient solution is …

Livebench: A challenging, contamination-free llm benchmark

C White, S Dooley, M Roberts, A Pal, B Feuer… - arxiv preprint arxiv …, 2024 - arxiv.org
Test set contamination, wherein test data from a benchmark ends up in a newer model's
training set, is a well-documented obstacle for fair LLM evaluation and can quickly render …

Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement

A Yang, B Zhang, B Hui, B Gao, B Yu, C Li… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we present a series of math-specific large language models: Qwen2. 5-Math
and Qwen2. 5-Math-Instruct-1.5 B/7B/72B. The core innovation of the Qwen2. 5 series lies in …

Unifying the perspectives of nlp and software engineering: A survey on language models for code

Z Zhang, C Chen, B Liu, C Liao, Z Gong, H Yu… - arxiv preprint arxiv …, 2023 - arxiv.org
In this work we systematically review the recent advancements in software engineering with
language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 …

A survey on large language models for software engineering

Q Zhang, C Fang, Y **e, Y Zhang, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Software Engineering (SE) is the systematic design, development, maintenance, and
management of software applications underpinning the digital infrastructure of our modern …

Scaling synthetic data creation with 1,000,000,000 personas

T Ge, X Chan, X Wang, D Yu, H Mi, D Yu - arxiv preprint arxiv:2406.20094, 2024 - arxiv.org
We propose a novel persona-driven data synthesis methodology that leverages various
perspectives within a large language model (LLM) to create diverse synthetic data. To fully …