A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

Large language model inference acceleration: A comprehensive hardware perspective

J Li, J Xu, S Huang, Y Chen, W Li, J Liu, Y Lian… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities across various
fields, from natural language understanding to text generation. Compared to non-generative …

Compact language models via pruning and knowledge distillation

S Muralidharan, ST Sreenivas, RB Joshi… - The Thirty-eighth …, 2024 - openreview.net
Large language models (LLMs) targeting different deployment scales and sizes are currently
produced by training each variant from scratch; this is extremely compute-intensive. In this …

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

A survey on the memory mechanism of large language model based agents

Z Zhang, X Bo, C Ma, R Li, X Chen, Q Dai, J Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language model (LLM) based agents have recently attracted much attention from the
research and industry communities. Compared with original LLMs, LLM-based agents are …

[HTML][HTML] Topic Modeling for Faster Literature Screening Using Transformer-Based Embeddings

C Galli, C Cusano, M Meleti, N Donos, E Calciolari - Metrics, 2024 - mdpi.com
Systematic reviews are a powerful tool to summarize the existing evidence in medical
literature. However, identifying relevant articles is difficult, and this typically involves …

The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems

L Song, Z Pang, W Wang, Z Wang, XF Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
The wide deployment of Large Language Models (LLMs) has given rise to strong demands
for optimizing their inference performance. Today's techniques serving this purpose primarily …

Topic Analysis of the Literature Reveals the Research Structure: A Case Study in Periodontics

C Galli, MT Colangelo, M Meleti… - Big Data and …, 2025 - qmro.qmul.ac.uk
Periodontics is a complex field characterized by a constantly growing body of research,
which poses a challenge for researchers and stakeholders striving to stay abreast of the …

A Survey on Large Language Model Acceleration based on KV Cache Management

H Li, Y Li, A Tian, T Tang, Z Xu, X Chen, N Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have revolutionized a wide range of domains such as
natural language processing, computer vision, and multi-modal tasks due to their ability to …

[PDF][PDF] Topic Analysis of the Literature Reveals

C Galli, MT Colangelo, M Meleti, S Guizzardi… - 2024 - preprints.org
Periodontics is a complex field characterized by a constantly growing body of research,
posing a challenge for researchers and stakeholders striving to stay abreast of its evolving …