Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Understanding llms: A comprehensive overview from training to inference

Y Liu, H He, T Han, X Zhang, M Liu, J Tian, Y Zhang… - Neurocomputing, 2024 - Elsevier
The introduction of ChatGPT has led to a significant increase in the utilization of Large
Language Models (LLMs) for addressing downstream tasks. There's an increasing focus on …

Dora: Weight-decomposed low-rank adaptation

SY Liu, CY Wang, H Yin, P Molchanov… - … on Machine Learning, 2024 - openreview.net
Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its
variants have gained considerable popularity because of avoiding additional inference …

Simulating 500 million years of evolution with a language model

T Hayes, R Rao, H Akin, NJ Sofroniew, D Oktay, Z Lin… - Science, 2025 - science.org
More than three billion years of evolution have produced an image of biology encoded into
the space of natural proteins. Here we show that language models trained at scale on …

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

Sheared llama: Accelerating language model pre-training via structured pruning

M **a, T Gao, Z Zeng, D Chen - arxiv preprint arxiv:2310.06694, 2023 - arxiv.org
The popularity of LLaMA (Touvron et al., 2023a; b) and other recently emerged moderate-
sized large language models (LLMs) highlights the potential of building smaller yet powerful …

Yarn: Efficient context window extension of large language models

B Peng, J Quesnelle, H Fan, E Shippole - arxiv preprint arxiv:2309.00071, 2023 - arxiv.org
Rotary Position Embeddings (RoPE) have been shown to effectively encode positional
information in transformer-based language models. However, these models fail to …

Distrifusion: Distributed parallel inference for high-resolution diffusion models

M Li, T Cai, J Cao, Q Zhang, H Cai… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models have achieved great success in synthesizing high-quality images.
However generating high-resolution images with diffusion models is still challenging due to …

Olmo: Accelerating the science of language models

D Groeneveld, I Beltagy, P Walsh, A Bhagia… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models (LMs) have become ubiquitous in both NLP research and in commercial
product offerings. As their commercial importance has surged, the most powerful models …

Autoregressive model beats diffusion: Llama for scalable image generation

P Sun, Y Jiang, S Chen, S Zhang, B Peng… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce LlamaGen, a new family of image generation models that apply original``next-
token prediction''paradigm of large language models to visual generation domain. It is an …