Ammus: A survey of transformer-based pretrained models in natural language processing
KS Kalyan, A Rajasekharan, S Sangeetha - arxiv preprint arxiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …
almost every NLP task. The evolution of these models started with GPT and BERT. These …
Understanding llms: A comprehensive overview from training to inference
The introduction of ChatGPT has led to a significant increase in the utilization of Large
Language Models (LLMs) for addressing downstream tasks. There's an increasing focus on …
Language Models (LLMs) for addressing downstream tasks. There's an increasing focus on …
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …
capabilities with increasing scale. Despite their potentially transformative impact, these new …
XLS-R: Self-supervised cross-lingual speech representation learning at scale
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …
Language models are multilingual chain-of-thought reasoners
We evaluate the reasoning abilities of large language models in multilingual settings. We
introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating …
introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating …
M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models
Despite the existence of various benchmarks for evaluating natural language processing
models, we argue that human exams are a more suitable means of evaluating general …
models, we argue that human exams are a more suitable means of evaluating general …
Modular deep learning
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …
trained models fine-tuned for downstream tasks achieve better performance with fewer …
mslam: Massively multilingual joint pre-training for speech and text
We present mSLAM, a multilingual Speech and LAnguage Model that learns cross-lingual
cross-modal representations of speech and text by pre-training jointly on large amounts of …
cross-modal representations of speech and text by pre-training jointly on large amounts of …
Charformer: Fast character transformers via gradient-based subword tokenization
State-of-the-art models in natural language processing rely on separate rigid subword
tokenization algorithms, which limit their generalization ability and adaptation to new …
tokenization algorithms, which limit their generalization ability and adaptation to new …
JGLUE: Japanese general language understanding evaluation
To develop high-performance natural language understanding (NLU) models, it is
necessary to have a benchmark to evaluate and analyze NLU ability from various …
necessary to have a benchmark to evaluate and analyze NLU ability from various …