Priors in bayesian deep learning: A review

V Fortuin - International Statistical Review, 2022 - Wiley Online Library
While the choice of prior is one of the most critical parts of the Bayesian inference workflow,
recent Bayesian deep learning models have often fallen back on vague priors, such as …

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Adalora: Adaptive budget allocation for parameter-efficient fine-tuning

Q Zhang, M Chen, A Bukharin… - arxiv preprint arxiv …, 2023 - arxiv.org
Fine-tuning large pre-trained language models on downstream tasks has become an
important paradigm in NLP. However, common practice fine-tunes all of the parameters in a …

Less: Selecting influential data for targeted instruction tuning

M **a, S Malladi, S Gururangan, S Arora… - arxiv preprint arxiv …, 2024 - arxiv.org
Instruction tuning has unlocked powerful capabilities in large language models (LLMs),
effectively using combined datasets to develop generalpurpose chatbots. However, real …

[PDF][PDF] Lora: Low-rank adaptation of large language models.

EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang… - ICLR, 2022 - arxiv.org
The dominant paradigm of natural language processing consists of large-scale pre-training
on general domain data and adaptation to particular tasks or domains. As we pre-train larger …

Birth of a transformer: A memory viewpoint

A Bietti, V Cabannes, D Bouchacourt… - Advances in …, 2023 - proceedings.neurips.cc
Large language models based on transformers have achieved great empirical successes.
However, as they are deployed more widely, there is a growing need to better understand …

e3nn: Euclidean neural networks

M Geiger, T Smidt - arxiv preprint arxiv:2207.09453, 2022 - arxiv.org
We present e3nn, a generalized framework for creating E (3) equivariant trainable functions,
also known as Euclidean neural networks. e3nn naturally operates on geometry and …

Lora+: Efficient low rank adaptation of large models

S Hayou, N Ghosh, B Yu - arxiv preprint arxiv:2402.12354, 2024 - arxiv.org
In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et
al.(2021) leads to suboptimal finetuning of models with large width (embedding dimension) …

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

A kernel-based view of language model fine-tuning

S Malladi, A Wettig, D Yu, D Chen… - … on Machine Learning, 2023 - proceedings.mlr.press
It has become standard to solve NLP tasks by fine-tuning pre-trained language models
(LMs), especially in low-data settings. There is minimal theoretical understanding of …