- Academic Search

Alphapruning: Using heavy-tailed self regularization theory for improved layer-wise pruning of large language models

H Lu, Y Zhou, S Liu, Z Wang… - Advances in Neural …, 2025 - proceedings.neurips.cc

Recent work on pruning large language models (LLMs) has shown that one can eliminate a
large number of parameters without compromising performance, making pruning a …

Save Cite Cited by 4 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Are Gaussian data all you need? The extents and limits of universality in high-dimensional generalized linear estimation

L Pesce, F Krzakala, B Loureiro… - … on Machine Learning, 2023 - proceedings.mlr.press

In this manuscript we consider the problem of generalized linear estimation on Gaussian
mixture data with labels given by a single-index model. Our first result is a sharp asymptotic …

Save Cite Cited by 29 Related articles All 8 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A theory of non-linear feature learning with one gradient step in two-layer neural networks

B Moniri, D Lee, H Hassani, E Dobriban - arxiv preprint arxiv:2310.07891, 2023 - arxiv.org

Feature learning is thought to be one of the fundamental reasons for the success of deep
neural networks. It is rigorously known that in two-layer fully-connected neural networks …

Save Cite Cited by 28 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Temperature balancing, layer-wise weight analysis, and neural network training

Y Zhou, T Pang, K Liu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Regularization in modern machine learning is crucial, and it can take various forms in
algorithmic design: training set, model family, error function, regularization terms, and …

Save Cite Cited by 11 Related articles All 6 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

DiffDomain enables identification of structurally reorganized topologically associating domains

D Hua, M Gu, X Zhang, Y Du, H **e, L Qi, X Du… - Nature …, 2024 - nature.com

Topologically associating domains (TADs) are critical structural units in three-dimensional
genome organization of mammalian genome. Dynamic reorganizations of TADs between …

Save Cite Cited by 2 Related articles All 10 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

C Paquette, E Paquette, B Adlam… - Mathematical …, 2024 - Springer

We develop a stochastic differential equation, called homogenized SGD, for analyzing the
dynamics of stochastic gradient descent (SGD) on a high-dimensional random least squares …

Save Cite Cited by 25 Related articles All 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Sketched ridgeless linear regression: The role of downsampling

X Chen, Y Zeng, S Yang, Q Sun - … Conference on Machine …, 2023 - proceedings.mlr.press

Overparametrization often helps improve the generalization performance. This paper
presents a dual view of overparametrization suggesting that downsampling may also help …

Save Cite Cited by 8 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Demystifying disagreement-on-the-line in high dimensions

D Lee, B Moniri, X Huang… - International …, 2023 - proceedings.mlr.press

Evaluating the performance of machine learning models under distribution shifts is
challenging, especially when we only have unlabeled data from the shifted (target) domain …

Save Cite Cited by 13 Related articles All 8 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality

P Qing, C Gao, Y Zhou, X Diao, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Parameter-efficient fine-tuning methods, such as Low-Rank Adaptation (LoRA), are known
to enhance training efficiency in Large Language Models (LLMs). Due to the limited …

Save Cite Cited by 3 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

" Lossless" compression of deep neural networks: a high-dimensional neural tangent kernel approach

Y Du, D **e, S Pu, R Qiu, Z Liao - Advances in Neural …, 2022 - proceedings.neurips.cc

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the
price of increased depth and having more parameters per layer, making their training and …

Save Cite Cited by 7 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

Create alert

Cite

Advanced search

Saved to My library

Random matrix methods for machine learning

Alphapruning: Using heavy-tailed self regularization theory for improved layer-wise pruning of large language models

Are Gaussian data all you need? The extents and limits of universality in high-dimensional generalized linear estimation

A theory of non-linear feature learning with one gradient step in two-layer neural networks

Temperature balancing, layer-wise weight analysis, and neural network training

DiffDomain enables identification of structurally reorganized topologically associating domains

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

Sketched ridgeless linear regression: The role of downsampling

Demystifying disagreement-on-the-line in high dimensions

AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality

" Lossless" compression of deep neural networks: a high-dimensional neural tangent kernel approach