Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arxiv preprint arxiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

[HTML][HTML] Putting gpt-4o to the sword: A comprehensive evaluation of language, vision, speech, and multimodal proficiency

S Shahriar, BD Lund, NR Mannuru, MA Arshad… - Applied Sciences, 2024 - mdpi.com
As large language models (LLMs) continue to advance, evaluating their comprehensive
capabilities becomes significant for their application in various fields. This research study …

The bigscience roots corpus: A 1.6 tb composite multilingual dataset

H Laurençon, L Saulnier, T Wang… - Advances in …, 2022 - proceedings.neurips.cc
As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …

Multilingual denoising pre-training for neural machine translation

Y Liu, J Gu, N Goyal, X Li, S Edunov… - Transactions of the …, 2020 - direct.mit.edu
This paper demonstrates that multilingual denoising pre-training produces significant
performance gains across a wide variety of machine translation (MT) tasks. We present …

InfoXLM: An information-theoretic framework for cross-lingual language model pre-training

Z Chi, L Dong, F Wei, N Yang, S Singhal… - arxiv preprint arxiv …, 2020 - arxiv.org
In this work, we present an information-theoretic framework that formulates cross-lingual
language model pre-training as maximizing mutual information between multilingual-multi …

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arxiv preprint arxiv …, 2024 - arxiv.org
Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …

Accelerating transformer inference for translation via parallel decoding

A Santilli, S Severino, E Postolache, V Maiorca… - arxiv preprint arxiv …, 2023 - arxiv.org
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT).
The community proposed specific network architectures and learning-based methods to …

Findings of the 2021 conference on machine translation (WMT21)

F Akhbardeh, A Arkhangorodsky, M Biesialska… - Proceedings of the sixth …, 2021 - cris.fbk.eu
This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …

XLM-E: Cross-lingual language model pre-training via ELECTRA

Z Chi, S Huang, L Dong, S Ma, B Zheng… - arxiv preprint arxiv …, 2021 - arxiv.org
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-
training. Specifically, we present two pre-training tasks, namely multilingual replaced token …