Multi-task learning in natural language processing: An overview

S Chen, Y Zhang, Q Yang - ACM Computing Surveys, 2024 - dl.acm.org
Deep learning approaches have achieved great success in the field of Natural Language
Processing (NLP). However, directly training deep neural models often suffer from overfitting …

How good are gpt models at machine translation? a comprehensive evaluation

A Hendy, M Abdelrehim, A Sharaf, V Raunak… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative Pre-trained Transformer (GPT) models have shown remarkable capabilities for
natural language generation, but their performance for machine translation has not been …

Madlad-400: A multilingual and document-level large audited dataset

S Kudugunta, I Caswell, B Zhang… - Advances in …, 2023 - proceedings.neurips.cc
We introduce MADLAD-400, a manually audited, general domain 3T token monolingual
dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations …

Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale

RY Aminabadi, S Rajbhandari, AA Awan… - … Conference for High …, 2022 - ieeexplore.ieee.org
The landscape of transformer model inference is increasingly diverse in model size, model
characteristics, latency and throughput requirements, hardware requirements, etc. With such …

Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale

S Rajbhandari, C Li, Z Yao, M Zhang… - International …, 2022 - proceedings.mlr.press
As the training of giant dense models hits the boundary on the availability and capability of
the hardware resources today, Mixture-of-Experts (MoE) models have become one of the …

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

To repeat or not to repeat: Insights from scaling llm under token-crisis

F Xue, Y Fu, W Zhou, Z Zheng… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent research has highlighted the importance of dataset size in scaling language models.
However, large language models (LLMs) are notoriously token-hungry during pre-training …

Llmcarbon: Modeling the end-to-end carbon footprint of large language models

A Faiz, S Kaneda, R Wang, R Osi, P Sharma… - arxiv preprint arxiv …, 2023 - arxiv.org
The carbon footprint associated with large language models (LLMs) is a significant concern,
encompassing emissions from their training, inference, experimentation, and storage …

Scaling vision-language models with sparse mixture of experts

S Shen, Z Yao, C Li, T Darrell, K Keutzer… - arxiv preprint arxiv …, 2023 - arxiv.org
The field of natural language processing (NLP) has made significant strides in recent years,
particularly in the development of large-scale vision-language models (VLMs). These …

Indictrans2: Towards high-quality and accessible machine translation models for all 22 scheduled indian languages

J Gala, PA Chitale, R AK, V Gumma… - arxiv preprint arxiv …, 2023 - arxiv.org
India has a rich linguistic landscape with languages from 4 major language families spoken
by over a billion people. 22 of these languages are listed in the Constitution of India …