Pythia: A suite for analyzing large language models across training and scaling

S Biderman, H Schoelkopf… - International …, 2023 - proceedings.mlr.press
How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …

NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji, GI Winata… - arxiv preprint arxiv …, 2022 - arxiv.org
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

Aya model: An instruction finetuned open-access multilingual language model

A Üstün, V Aryabumi, ZX Yong, WY Ko… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent breakthroughs in large language models (LLMs) have centered around a handful of
data-rich languages. What does it take to broaden access to breakthroughs beyond first …

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arxiv preprint arxiv:2302.11529, 2023 - arxiv.org
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

Aya 23: Open weight releases to further multilingual progress

V Aryabumi, J Dang, D Talupuru, S Dash… - arxiv preprint arxiv …, 2024 - arxiv.org
This technical report introduces Aya 23, a family of multilingual language models. Aya 23
builds on the recent release of the Aya model (\" Ust\" un et al., 2024), focusing on pairing a …

Exploring the benefits of training expert language models over instruction tuning

J Jang, S Kim, S Ye, D Kim… - International …, 2023 - proceedings.mlr.press
Abstract Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known
as multitask-prompted fine-tuning (MT), have shown capabilities to generalize to unseen …

Pile of law: Learning responsible data filtering from the law and a 256gb open-source legal dataset

P Henderson, M Krass, L Zheng… - Advances in …, 2022 - proceedings.neurips.cc
One concern with the rise of large language models lies with their potential for significant
harm, particularly from pretraining on biased, obscene, copyrighted, and private information …

Silo language models: Isolating legal risk in a nonparametric datastore

S Min, S Gururangan, E Wallace, W Shi… - arxiv preprint arxiv …, 2023 - arxiv.org
The legality of training language models (LMs) on copyrighted or otherwise restricted data is
under intense debate. However, as we show, model performance significantly degrades if …

Adapters: A unified library for parameter-efficient and modular transfer learning

C Poth, H Sterz, I Paul, S Purkayastha… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce Adapters, an open-source library that unifies parameter-efficient and modular
transfer learning in large language models. By integrating 10 diverse adapter methods into a …

Investigating cultural alignment of large language models

B AlKhamissi, M ElNokrashy, M AlKhamissi… - arxiv preprint arxiv …, 2024 - arxiv.org
The intricate relationship between language and culture has long been a subject of
exploration within the realm of linguistic anthropology. Large Language Models (LLMs) …