- Academic Search

Approximate computing survey, Part II: Application-specific & architectural approximation techniques and applications

V Leon, MA Hanif, G Armeniakos, X Jiao… - ACM Computing …, 2023 - dl.acm.org

The challenging deployment of compute-intensive applications from domains such as
Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of …

Tallenna Viittaa Viittausten määrä 11 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Tallenna Viittaa Viittausten määrä 90 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rptq: Reorder-based post-training quantization for large language models

Z Yuan, L Niu, J Liu, W Liu, X Wang, Y Shang… - ar**

Z Liu, W Ni, J Leng, Y Feng, C Guo, Q Chen… - Proceedings of the 29th …, 2024 - dl.acm.org

Approximate nearest neighbor (ANN) search is a widely applied technique in modern
intelligent applications, such as recommendation systems and vector databases. Therefore …

Tallenna Viittaa Viittausten määrä 7 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

C Guo, F Cheng, Z Du, J Kiessling, J Ku… - IEEE Circuits and …, 2025 - ieeexplore.ieee.org

The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …

Tallenna Viittaa Viittausten määrä 2 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

vtensor: Flexible virtual tensor management for efficient llm serving

J Xu, R Zhang, C Guo, W Hu, Z Liu, F Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) are widely used across various domains, processing
millions of daily requests. This surge in demand poses significant challenges in optimizing …

Tallenna Viittaa Viittausten määrä 3 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

Approximate computing survey, Part II: Application-specific & architectural approximation techniques and applications

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

Rptq: Reorder-based post-training quantization for large language models

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

vtensor: Flexible virtual tensor management for efficient llm serving