On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Finetuned language models are zero-shot learners

J Wei, M Bosma, VY Zhao, K Guu, AW Yu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper explores a simple method for improving the zero-shot learning abilities of
language models. We show that instruction tuning--finetuning language models on a …

End-to-end transformer-based models in textual-based NLP

A Rahali, MA Akhloufi - AI, 2023 - mdpi.com
Transformer architectures are highly expressive because they use self-attention
mechanisms to encode long-range dependencies in the input sequences. In this paper, we …

Language models are multilingual chain-of-thought reasoners

F Shi, M Suzgun, M Freitag, X Wang, S Srivats… - arxiv preprint arxiv …, 2022 - arxiv.org
We evaluate the reasoning abilities of large language models in multilingual settings. We
introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating …

Linguistically inspired roadmap for building biologically reliable protein language models

MH Vu, R Akbar, PA Robert, B Swiatczak… - Nature Machine …, 2023 - nature.com
Deep neural-network-based language models (LMs) are increasingly applied to large-scale
protein sequence data to predict protein function. However, being largely black-box models …

Muril: Multilingual representations for indian languages

S Khanuja, D Bansal, S Mehtani, S Khosla… - arxiv preprint arxiv …, 2021 - arxiv.org
India is a multilingual society with 1369 rationalized languages and dialects being spoken
across the country (INDIA, 2011). Of these, the 22 scheduled languages have a staggering …

Med-unic: Unifying cross-lingual medical vision-language pre-training by diminishing bias

Z Wan, C Liu, M Zhang, J Fu, B Wang… - Advances in …, 2024 - proceedings.neurips.cc
The scarcity of data presents a critical obstacle to the efficacy of medical vision-language pre-
training (VLP). A potential solution lies in the combination of datasets from various language …

mgpt: Few-shot learners go multilingual

O Shliazhko, A Fenogenova, M Tikhonova… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent studies report that autoregressive language models can successfully solve many
NLP tasks via zero-and few-shot learning paradigms, which opens up new possibilities for …

LLM-powered data augmentation for enhanced cross-lingual performance

C Whitehouse, M Choudhury, AF Aji - arxiv preprint arxiv:2305.14288, 2023 - arxiv.org
This paper explores the potential of leveraging Large Language Models (LLMs) for data
augmentation in multilingual commonsense reasoning datasets where the available training …

Language models are few-shot multilingual learners

GI Winata, A Madotto, Z Lin, R Liu, J Yosinski… - arxiv preprint arxiv …, 2021 - arxiv.org
General-purpose language models have demonstrated impressive capabilities, performing
on par with state-of-the-art approaches on a range of downstream natural language …