The emergence of essential sparsity in large pre-trained models: The weights that matter

A Jaiswal, S Liu, T Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Large pre-trained transformers are $\textit {show-stealer} $ in modern-day deep learning,
and it becomes crucial to comprehend the parsimonious patterns that exist within them as …

Instant soup: Cheap pruning ensembles in a single pass can draw lottery tickets from large models

AK Jaiswal, S Liu, T Chen, Y Ding… - … on Machine Learning, 2023 - proceedings.mlr.press
Large pre-trained transformers have been receiving explosive attention in the past few
years, due to their acculturation for numerous downstream applications via fine-tuning, but …

Dynamic sparse no training: Training-free fine-tuning for sparse llms

Y Zhang, L Zhao, M Lin, Y Sun, Y Yao, X Han… - arxiv preprint arxiv …, 2023 - arxiv.org
The ever-increasing large language models (LLMs), though opening a potential path for the
upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards …

Ten lessons we have learned in the new" sparseland": A short handbook for sparse neural network researchers

S Liu, Z Wang - arxiv preprint arxiv:2302.02596, 2023 - arxiv.org
This article does not propose any novel algorithm or new hardware for sparsity. Instead, it
aims to serve the" common good" for the increasingly prosperous Sparse Neural Network …

Learning scalable model soup on a single gpu: An efficient subspace training strategy

T Li, W Jiang, F Liu, X Huang, JT Kwok - European Conference on …, 2024 - Springer
Pre-training followed by fine-tuning is widely adopted among practitioners. The performance
can be improved by “model soups” via exploring various hyperparameter configurations …

Domain-generalizable multiple-domain clustering

A Rozner, B Battash, L Wolf, O Lindenbaum - arxiv preprint arxiv …, 2023 - arxiv.org
Accurately clustering high-dimensional measurements is vital for adequately analyzing
scientific data. Deep learning machinery has remarkably improved clustering capabilities in …

[HTML][HTML] Federated and edge learning for large language models

F Piccialli, D Chiaro, P Qi, V Bellandi, E Damiani - Information Fusion, 2025 - Elsevier
As the demand for sophisticated language models (LMs) continues to grow, the necessity to
deploy them efficiently across federated and edge environments becomes increasingly …

Sequential bayesian neural subnetwork ensembles

S Jantre, S Bhattacharya, NM Urban, BJ Yoon… - arxiv preprint arxiv …, 2022 - arxiv.org
Deep ensembles have emerged as a powerful technique for improving predictive
performance and enhancing model robustness across various applications by leveraging …

SEVEN: Pruning Transformer Model by Reserving Sentinels

J **ao, P Li, J Nie, Z Tang - arxiv preprint arxiv:2403.12688, 2024 - arxiv.org
Large-scale Transformer models (TM) have demonstrated outstanding performance across
various tasks. However, their considerable parameter size restricts their applicability …

Unveiling the Intertwined Relationship Between Essential Sparsity and Robustness in Large Pre-trained Models

S Shin, A Jaiswal, S Liu, Z Wang - 2024 - openreview.net
In the era of pre-trained LLMs, understanding their intrinsic sparse patterns becomes
paramount, especially in the context of their scalability and efficiency. Recently, Jaiswal et …