- Academic Search

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

Save Cite Cited by 122 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Large language model inference acceleration: A comprehensive hardware perspective

J Li, J Xu, S Huang, Y Chen, W Li, J Liu, Y Lian… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities across various
fields, from natural language understanding to text generation. Compared to non-generative …

Save Cite Cited by 6 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

Compact language models via pruning and knowledge distillation

S Muralidharan, ST Sreenivas, RB Joshi… - The Thirty-eighth …, 2024 - openreview.net

Large language models (LLMs) targeting different deployment scales and sizes are currently
produced by training each variant from scratch; this is extremely compute-intensive. In this …

Save Cite Cited by 37 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

Save Cite Cited by 73 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A survey on the memory mechanism of large language model based agents

Z Zhang, X Bo, C Ma, R Li, X Chen, Q Dai, J Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language model (LLM) based agents have recently attracted much attention from the
research and industry communities. Compared with original LLMs, LLM-based agents are …

Save Cite Cited by 59 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[HTML] mdpi.com

[HTML][HTML] Topic Modeling for Faster Literature Screening Using Transformer-Based Embeddings

C Galli, C Cusano, M Meleti, N Donos, E Calciolari - Metrics, 2024 - mdpi.com

Systematic reviews are a powerful tool to summarize the existing evidence in medical
literature. However, identifying relevant articles is difficult, and this typically involves …

Save Cite Cited by 3 Related articles All 4 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] arxiv.org

The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems

L Song, Z Pang, W Wang, Z Wang, XF Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

The wide deployment of Large Language Models (LLMs) has given rise to strong demands
for optimizing their inference performance. Today's techniques serving this purpose primarily …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] qmul.ac.uk

Topic Analysis of the Literature Reveals the Research Structure: A Case Study in Periodontics

C Galli, MT Colangelo, M Meleti… - Big Data and …, 2025 - qmro.qmul.ac.uk

Periodontics is a complex field characterized by a constantly growing body of research,
which poses a challenge for researchers and stakeholders striving to stay abreast of the …

[Free GPT-4]

[PDF] arxiv.org

A Survey on Large Language Model Acceleration based on KV Cache Management

H Li, Y Li, A Tian, T Tang, Z Xu, X Chen, N Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have revolutionized a wide range of domains such as
natural language processing, computer vision, and multi-modal tasks due to their ability to …

Save Cite Related articles View as HTML

[Free GPT-4]

[PDF] preprints.org

[PDF][PDF] Topic Analysis of the Literature Reveals

C Galli, MT Colangelo, M Meleti, S Guizzardi… - 2024 - preprints.org

Periodontics is a complex field characterized by a constantly growing body of research,
posing a challenge for researchers and stakeholders striving to stay abreast of its evolving …

Create alert

Cite

Advanced search

Saved to My library

A comprehensive survey of compression algorithms for language models

A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

Large language model inference acceleration: A comprehensive hardware perspective

Compact language models via pruning and knowledge distillation

A survey on efficient inference for large language models

A survey on the memory mechanism of large language model based agents

[HTML][HTML] Topic Modeling for Faster Literature Screening Using Transformer-Based Embeddings

The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems

Topic Analysis of the Literature Reveals the Research Structure: A Case Study in Periodontics

A Survey on Large Language Model Acceleration based on KV Cache Management

[PDF][PDF] Topic Analysis of the Literature Reveals