Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arxiv preprint arxiv:2302.11529, 2023 - arxiv.org
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

Language-specific neurons: The key to multilingual capabilities in large language models

T Tang, W Luo, H Huang, D Zhang, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) demonstrate remarkable multilingual capabilities without
being pre-trained on specially curated multilingual parallel corpora. It remains a challenging …

Language embeddings sometimes contain typological generalizations

R Östling, M Kurfalı - Computational Linguistics, 2023 - direct.mit.edu
To what extent can neural network models learn generalizations about language structure,
and how do we find out what they have learned? We explore these questions by training …

The role of typological feature prediction in NLP and linguistics

J Bjerva - Computational Linguistics, 2024 - direct.mit.edu
Computational typology has gained traction in the field of Natural Language Processing
(NLP) in recent years, as evidenced by the increasing number of papers on the topic and the …

Data-driven cross-lingual syntax: An agreement study with massively multilingual models

AG Varda, M Marelli - Computational Linguistics, 2023 - direct.mit.edu
Massively multilingual models such as mBERT and XLM-R are increasingly valued in
Natural Language Processing research and applications, due to their ability to tackle the …

Phylogeny-inspired adaptation of multilingual models to new languages

F Faisal, A Anastasopoulos - arxiv preprint arxiv:2205.09634, 2022 - arxiv.org
Large pretrained multilingual models, trained on dozens of languages, have delivered
promising results due to cross-lingual learning capabilities on variety of language tasks …

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

T Kojima, I Okimura, Y Iwasawa, H Yanaka… - arxiv preprint arxiv …, 2024 - arxiv.org
Current decoder-based pre-trained language models (PLMs) successfully demonstrate
multilingual capabilities. However, it is unclear how these models handle multilingualism …

Multilingual Speech Models for Automatic Speech Recognition Exhibit Gender Performance Gaps

G Attanasio, B Savoldi, D Fucci, D Hovy - arxiv preprint arxiv:2402.17954, 2024 - arxiv.org
Current voice recognition approaches use multi-task, multilingual models for speech tasks
like Automatic Speech Recognition (ASR) to make them applicable to many languages …

Interpreting arithmetic mechanism in large language models through comparative neuron analysis

Z Yu, S Ananiadou - arxiv preprint arxiv:2409.14144, 2024 - arxiv.org
We find arithmetic ability resides within a limited number of attention heads, with each head
specializing in distinct operations. To delve into the reason, we introduce the Comparative …

Causal analysis of syntactic agreement neurons in multilingual language models

A Mueller, Y **a, T Linzen - arxiv preprint arxiv:2210.14328, 2022 - arxiv.org
Structural probing work has found evidence for latent syntactic information in pre-trained
language models. However, much of this analysis has focused on monolingual models, and …