A review of current trends, techniques, and challenges in large language models (llms)

R Patil, V Gudivada - Applied Sciences, 2024 - mdpi.com
Natural language processing (NLP) has significantly transformed in the last decade,
especially in the field of language modeling. Large language models (LLMs) have achieved …

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only

G Penedo, Q Malartic, D Hesslow, R Cojocaru… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models are commonly trained on a mixture of filtered web data and curated
high-quality corpora, such as social media conversations, books, or technical papers. This …

ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers

Y Balaji, S Nah, X Huang, A Vahdat, J Song… - arxiv preprint arxiv …, 2022 - arxiv.org
Large-scale diffusion-based generative models have led to breakthroughs in text-
conditioned high-resolution image synthesis. Starting from random noise, such text-to-image …

Glm-130b: An open bilingual pre-trained model

A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model
with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as …

The refinedweb dataset for falcon llm: Outperforming curated corpora with web data only

G Penedo, Q Malartic, D Hesslow… - Advances in …, 2023 - proceedings.neurips.cc
Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical …

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in neural …, 2022 - proceedings.neurips.cc
Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press
Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

Training compute-optimal large language models

J Hoffmann, S Borgeaud, A Mensch… - arxiv preprint arxiv …, 2022 - arxiv.org
We investigate the optimal model size and number of tokens for training a transformer
language model under a given compute budget. We find that current large language models …

D4: Improving llm pretraining via document de-duplication and diversification

K Tirumala, D Simig, A Aghajanyan… - Advances in Neural …, 2023 - proceedings.neurips.cc
Over recent years, an increasing amount of compute and data has been poured into training
large language models (LLMs), usually by doing one-pass learning on as many tokens as …