- Academic Search

A Jaiswal, S Liu, T Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

Large pre-trained transformers are $\textit {show-stealer} $ in modern-day deep learning,
and it becomes crucial to comprehend the parsimonious patterns that exist within them as …

Save Cite Cited by 38 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Instant soup: Cheap pruning ensembles in a single pass can draw lottery tickets from large models

AK Jaiswal, S Liu, T Chen, Y Ding… - … on Machine Learning, 2023 - proceedings.mlr.press

Large pre-trained transformers have been receiving explosive attention in the past few
years, due to their acculturation for numerous downstream applications via fine-tuning, but …

Save Cite Cited by 20 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Dynamic sparse no training: Training-free fine-tuning for sparse llms

Y Zhang, L Zhao, M Lin, Y Sun, Y Yao, X Han… - arxiv preprint arxiv …, 2023 - arxiv.org

The ever-increasing large language models (LLMs), though opening a potential path for the
upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards …

Save Cite Cited by 43 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Ten lessons we have learned in the new" sparseland": A short handbook for sparse neural network researchers

S Liu, Z Wang - arxiv preprint arxiv:2302.02596, 2023 - arxiv.org

This article does not propose any novel algorithm or new hardware for sparsity. Instead, it
aims to serve the" common good" for the increasingly prosperous Sparse Neural Network …

Save Cite Cited by 21 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Learning scalable model soup on a single gpu: An efficient subspace training strategy

T Li, W Jiang, F Liu, X Huang, JT Kwok - European Conference on …, 2024 - Springer

Pre-training followed by fine-tuning is widely adopted among practitioners. The performance
can be improved by “model soups” via exploring various hyperparameter configurations …

Save Cite Cited by 2 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Domain-generalizable multiple-domain clustering

A Rozner, B Battash, L Wolf, O Lindenbaum - arxiv preprint arxiv …, 2023 - arxiv.org

Accurately clustering high-dimensional measurements is vital for adequately analyzing
scientific data. Deep learning machinery has remarkably improved clustering capabilities in …

Save Cite Cited by 8 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Federated and edge learning for large language models

F Piccialli, D Chiaro, P Qi, V Bellandi, E Damiani - Information Fusion, 2025 - Elsevier

As the demand for sophisticated language models (LMs) continues to grow, the necessity to
deploy them efficiently across federated and edge environments becomes increasingly …

[Free GPT-4]

[PDF] arxiv.org

Sequential bayesian neural subnetwork ensembles

S Jantre, S Bhattacharya, NM Urban, BJ Yoon… - arxiv preprint arxiv …, 2022 - arxiv.org

Deep ensembles have emerged as a powerful technique for improving predictive
performance and enhancing model robustness across various applications by leveraging …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

SEVEN: Pruning Transformer Model by Reserving Sentinels

J **ao, P Li, J Nie, Z Tang - arxiv preprint arxiv:2403.12688, 2024 - arxiv.org

Large-scale Transformer models (TM) have demonstrated outstanding performance across
various tasks. However, their considerable parameter size restricts their applicability …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

Unveiling the Intertwined Relationship Between Essential Sparsity and Robustness in Large Pre-trained Models

S Shin, A Jaiswal, S Liu, Z Wang - 2024 - openreview.net

In the era of pre-trained LLMs, understanding their intrinsic sparse patterns becomes
paramount, especially in the context of their scalability and efficiency. Recently, Jaiswal et …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Lottery pools: Winning more by interpolating tickets without increasing training or inference cost

The emergence of essential sparsity in large pre-trained models: The weights that matter

Instant soup: Cheap pruning ensembles in a single pass can draw lottery tickets from large models

Dynamic sparse no training: Training-free fine-tuning for sparse llms

Ten lessons we have learned in the new" sparseland": A short handbook for sparse neural network researchers

Learning scalable model soup on a single gpu: An efficient subspace training strategy

Domain-generalizable multiple-domain clustering

[HTML][HTML] Federated and edge learning for large language models

Sequential bayesian neural subnetwork ensembles

SEVEN: Pruning Transformer Model by Reserving Sentinels

Unveiling the Intertwined Relationship Between Essential Sparsity and Robustness in Large Pre-trained Models