GPTEval: A survey on assessments of ChatGPT and GPT-4

R Mao, G Chen, X Zhang, F Guerin… - arxiv preprint arxiv …, 2023 - arxiv.org
The emergence of ChatGPT has generated much speculation in the press about its potential
to disrupt social and economic systems. Its astonishing language ability has aroused strong …

Pre-trained language models for text generation: A survey

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org
Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji, GI Winata… - arxiv preprint arxiv …, 2022 - arxiv.org
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

Aya dataset: An open-access collection for multilingual instruction tuning

S Singh, F Vargus, D Dsouza, BF Karlsson… - arxiv preprint arxiv …, 2024 - arxiv.org
Datasets are foundational to many breakthroughs in modern artificial intelligence. Many
recent achievements in the space of natural language processing (NLP) can be attributed to …

Survey of low-resource machine translation

B Haddow, R Bawden, AVM Barone, J Helcl… - Computational …, 2022 - direct.mit.edu
We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …

Maestro: Matched speech text representations through modality matching

Z Chen, Y Zhang, A Rosenberg… - arxiv preprint arxiv …, 2022 - arxiv.org
We present Maestro, a self-supervised training method to unify representations learnt from
speech and text modalities. Self-supervised learning from speech signals aims to learn the …

Indictrans2: Towards high-quality and accessible machine translation models for all 22 scheduled indian languages

J Gala, PA Chitale, R AK, V Gumma… - arxiv preprint arxiv …, 2023 - arxiv.org
India has a rich linguistic landscape with languages from 4 major language families spoken
by over a billion people. 22 of these languages are listed in the Constitution of India …

Zero-shot cross-lingual summarization via large language models

J Wang, Y Liang, F Meng, B Zou, Z Li, J Qu… - arxiv preprint arxiv …, 2023 - arxiv.org
Given a document in a source language, cross-lingual summarization (CLS) aims to
generate a summary in a different target language. Recently, the emergence of Large …

A survey on cross-lingual summarization

J Wang, F Meng, D Zheng, Y Liang, Z Li… - Transactions of the …, 2022 - direct.mit.edu
Cross-lingual summarization is the task of generating a summary in one language (eg,
English) for the given document (s) in a different language (eg, Chinese). Under the …