Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

A review of current trends, techniques, and challenges in large language models (llms)

R Patil, V Gudivada - Applied Sciences, 2024 - mdpi.com
Natural language processing (NLP) has significantly transformed in the last decade,
especially in the field of language modeling. Large language models (LLMs) have achieved …

Qwen technical report

J Bai, S Bai, Y Chu, Z Cui, K Dang, X Deng… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have revolutionized the field of artificial intelligence,
enabling natural language processing tasks that were previously thought to be exclusive to …

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Mixtral of experts

AQ Jiang, A Sablayrolles, A Roux, A Mensch… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has
the same architecture as Mistral 7B, with the difference that each layer is composed of 8 …

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Efficiently scaling transformer inference

R Pope, S Douglas, A Chowdhery… - Proceedings of …, 2023 - proceedings.mlsys.org
We study the problem of efficient generative inference for Transformer models, in one of its
most challenging settings: large deep models, with tight latency targets and long sequence …

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models

D Dai, C Deng, C Zhao, RX Xu, H Gao, D Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for
managing computational costs when scaling up model parameters. However, conventional …

Raphael: Text-to-image generation via large mixture of diffusion paths

Z Xue, G Song, Q Guo, B Liu, Z Zong… - Advances in Neural …, 2023 - proceedings.neurips.cc
Text-to-image generation has recently witnessed remarkable achievements. We introduce a
text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images …