A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …
massive model sizes that require significant computational and storage resources. To …
Large language model inference acceleration: A comprehensive hardware perspective
Large Language Models (LLMs) have demonstrated remarkable capabilities across various
fields, from natural language understanding to text generation. Compared to non-generative …
fields, from natural language understanding to text generation. Compared to non-generative …
Compact language models via pruning and knowledge distillation
Large language models (LLMs) targeting different deployment scales and sizes are currently
produced by training each variant from scratch; this is extremely compute-intensive. In this …
produced by training each variant from scratch; this is extremely compute-intensive. In this …
A survey on efficient inference for large language models
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …
performance across various tasks. However, the substantial computational and memory …
A survey on the memory mechanism of large language model based agents
Large language model (LLM) based agents have recently attracted much attention from the
research and industry communities. Compared with original LLMs, LLM-based agents are …
research and industry communities. Compared with original LLMs, LLM-based agents are …
[HTML][HTML] Topic Modeling for Faster Literature Screening Using Transformer-Based Embeddings
Systematic reviews are a powerful tool to summarize the existing evidence in medical
literature. However, identifying relevant articles is difficult, and this typically involves …
literature. However, identifying relevant articles is difficult, and this typically involves …
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
The wide deployment of Large Language Models (LLMs) has given rise to strong demands
for optimizing their inference performance. Today's techniques serving this purpose primarily …
for optimizing their inference performance. Today's techniques serving this purpose primarily …
Topic Analysis of the Literature Reveals the Research Structure: A Case Study in Periodontics
C Galli, MT Colangelo, M Meleti… - Big Data and …, 2025 - qmro.qmul.ac.uk
Periodontics is a complex field characterized by a constantly growing body of research,
which poses a challenge for researchers and stakeholders striving to stay abreast of the …
which poses a challenge for researchers and stakeholders striving to stay abreast of the …
A Survey on Large Language Model Acceleration based on KV Cache Management
H Li, Y Li, A Tian, T Tang, Z Xu, X Chen, N Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have revolutionized a wide range of domains such as
natural language processing, computer vision, and multi-modal tasks due to their ability to …
natural language processing, computer vision, and multi-modal tasks due to their ability to …
[PDF][PDF] Topic Analysis of the Literature Reveals
C Galli, MT Colangelo, M Meleti, S Guizzardi… - 2024 - preprints.org
Periodontics is a complex field characterized by a constantly growing body of research,
posing a challenge for researchers and stakeholders striving to stay abreast of its evolving …
posing a challenge for researchers and stakeholders striving to stay abreast of its evolving …