Crosslingual generalization through multitask finetuning

N Muennighoff, T Wang, L Sutawika, A Roberts… - arxiv preprint arxiv …, 2022 - arxiv.org
Multitask prompted finetuning (MTF) has been shown to help large language models
generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused …

Pretraining language models with human preferences

T Korbak, K Shi, A Chen, RV Bhalerao… - International …, 2023 - proceedings.mlr.press
Abstract Language models (LMs) are pretrained to imitate text from large and diverse
datasets that contain content that would violate human preferences if generated by an LM …

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arxiv preprint arxiv:2302.11529, 2023 - arxiv.org
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

Lora learns less and forgets less

D Biderman, J Portes, JJG Ortiz, M Paul… - … on Machine Learning …, 2024 - openreview.net
Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for
large language models. LoRA saves memory by training only low rank perturbations to …

Conditional adapters: Parameter-efficient transfer learning with fast inference

T Lei, J Bai, S Brahma, J Ainslie… - Advances in …, 2023 - proceedings.neurips.cc
Abstract We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning
method that also improves inference efficiency. CoDA generalizes beyond standard adapter …

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arxiv preprint arxiv …, 2024 - arxiv.org
Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …

Slm: Bridge the thin gap between speech and text foundation models

M Wang, W Han, I Shafran, Z Wu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-
modal model that takes advantage of pretrained foundational speech and language models …

Understanding and mitigating language confusion in llms

K Marchisio, WY Ko, A Bérard, T Dehaze… - arxiv preprint arxiv …, 2024 - arxiv.org
We investigate a surprising limitation of LLMs: their inability to consistently generate text in a
user's desired language. We create the Language Confusion Benchmark (LCB) to evaluate …

PrivacyMind: large language models can be contextual privacy protection learners

Y **ao, Y **, Y Bai, Y Wu, X Yang, X Luo, W Yu… - arxiv preprint arxiv …, 2023 - arxiv.org
The proliferation of Large Language Models (LLMs) has driven considerable interest in fine-
tuning them with domain-specific data to create specialized language models. Nevertheless …

QAmeleon: Multilingual QA with Only 5 Examples

P Agrawal, C Alberti, F Huot, J Maynez, J Ma… - Transactions of the …, 2023 - direct.mit.edu
The availability of large, high-quality datasets has been a major driver of recent progress in
question answering (QA). Such annotated datasets, however, are difficult and costly to …