Google Acadèmic

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Priors in bayesian deep learning: A review

V Fortuin - International Statistical Review, 2022 - Wiley Online Library

While the choice of prior is one of the most critical parts of the Bayesian inference workflow,
recent Bayesian deep learning models have often fallen back on vague priors, such as …

Desa Cita Citat per 156 Articles relacionats Totes les 8 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Desa Cita Citat per 194 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adalora: Adaptive budget allocation for parameter-efficient fine-tuning

Q Zhang, M Chen, A Bukharin… - arxiv preprint arxiv …, 2023 - arxiv.org

Fine-tuning large pre-trained language models on downstream tasks has become an
important paradigm in NLP. However, common practice fine-tunes all of the parameters in a …

Desa Cita Citat per 476 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Less: Selecting influential data for targeted instruction tuning

M **a, S Malladi, S Gururangan, S Arora… - arxiv preprint arxiv …, 2024 - arxiv.org

Instruction tuning has unlocked powerful capabilities in large language models (LLMs),
effectively using combined datasets to develop generalpurpose chatbots. However, real …

Desa Cita Citat per 145 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

[PDF][PDF] Lora: Low-rank adaptation of large language models.

EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang… - ICLR, 2022 - arxiv.org

The dominant paradigm of natural language processing consists of large-scale pre-training
on general domain data and adaptation to particular tasks or domains. As we pre-train larger …

Desa Cita Citat per 10997 Articles relacionats Totes les 12 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Birth of a transformer: A memory viewpoint

A Bietti, V Cabannes, D Bouchacourt… - Advances in …, 2023 - proceedings.neurips.cc

Large language models based on transformers have achieved great empirical successes.
However, as they are deployed more widely, there is a growing need to better understand …

Desa Cita Citat per 72 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

e3nn: Euclidean neural networks

M Geiger, T Smidt - arxiv preprint arxiv:2207.09453, 2022 - arxiv.org

We present e3nn, a generalized framework for creating E (3) equivariant trainable functions,
also known as Euclidean neural networks. e3nn naturally operates on geometry and …

Desa Cita Citat per 214 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lora+: Efficient low rank adaptation of large models

S Hayou, N Ghosh, B Yu - arxiv preprint arxiv:2402.12354, 2024 - arxiv.org

In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et
al.(2021) leads to suboptimal finetuning of models with large width (embedding dimension) …

Desa Cita Citat per 111 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

Desa Cita Citat per 162 Articles relacionats Totes les 10 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A kernel-based view of language model fine-tuning

S Malladi, A Wettig, D Yu, D Chen… - … on Machine Learning, 2023 - proceedings.mlr.press

It has become standard to solve NLP tasks by fine-tuning pre-trained language models
(LMs), especially in low-data settings. There is minimal theoretical understanding of …

Desa Cita Citat per 78 Articles relacionats Totes les 9 versions Free GPT-4 DeepSeek Versió HTML

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Priors in bayesian deep learning: A review

MM1: methods, analysis and insights from multimodal LLM pre-training

Adalora: Adaptive budget allocation for parameter-efficient fine-tuning

Less: Selecting influential data for targeted instruction tuning

[PDF][PDF] Lora: Low-rank adaptation of large language models.

Birth of a transformer: A memory viewpoint

e3nn: Euclidean neural networks

Lora+: Efficient low rank adaptation of large models

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

A kernel-based view of language model fine-tuning