Approximate computing survey, Part II: Application-specific & architectural approximation techniques and applications

V Leon, MA Hanif, G Armeniakos, X Jiao… - ACM Computing …, 2023 - dl.acm.org
The challenging deployment of compute-intensive applications from domains such as
Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of …

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Rptq: Reorder-based post-training quantization for large language models

Z Yuan, L Niu, J Liu, W Liu, X Wang, Y Shang… - ar**
Z Liu, W Ni, J Leng, Y Feng, C Guo, Q Chen… - Proceedings of the 29th …, 2024 - dl.acm.org
Approximate nearest neighbor (ANN) search is a widely applied technique in modern
intelligent applications, such as recommendation systems and vector databases. Therefore …

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

C Guo, F Cheng, Z Du, J Kiessling, J Ku… - IEEE Circuits and …, 2025 - ieeexplore.ieee.org
The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …

vtensor: Flexible virtual tensor management for efficient llm serving

J Xu, R Zhang, C Guo, W Hu, Z Liu, F Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) are widely used across various domains, processing
millions of daily requests. This surge in demand poses significant challenges in optimizing …