- Academic Search

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024 - arxiv.org

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

บันทึก อ้างอิง อ้างโดย272 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficientqat: Efficient quantization-aware training for large language models

M Chen, W Shao, P Xu, J Wang, P Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …

บันทึก อ้างอิง อ้างโดย26 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Talking heads: Understanding inter-layer communication in transformer language models

J Merullo, C Eickhoff, E Pavlick - Advances in Neural …, 2025 - proceedings.neurips.cc

Although it is known that transformer language models (LMs) pass features from early layers
to later layers, it is not well understood how this information is represented and routed by the …

บันทึก อ้างอิง อ้างโดย8 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Svdqunat: Absorbing outliers by low-rank components for 4-bit diffusion models

M Li, Y Lin, Z Zhang, T Cai, X Li, J Guo, E **e… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion models have been proven highly effective at generating high-quality images.
However, as these models grow larger, they require significantly more memory and suffer …

บันทึก อ้างอิง อ้างโดย10 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of low-bit large language models: Basics, systems, and algorithms

R Gong, Y Ding, Z Wang, C Lv, X Zheng, J Du… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …

บันทึก อ้างอิง อ้างโดย3 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Compressing large language models using low rank and low precision decomposition

R Saha, N Sagan, V Srivastava… - Advances in …, 2025 - proceedings.neurips.cc

The prohibitive sizes of Large Language Models (LLMs) today make it difficult to deploy
them on memory-constrained edge devices. This work introduces $\rm CALDERA $--a new …

บันทึก อ้างอิง อ้างโดย7 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Low-rank quantization-aware training for llms

Y Bondarenko, R Del Chiaro, M Nagel - arxiv preprint arxiv:2406.06385, 2024 - arxiv.org

Large language models (LLMs) are omnipresent, however their practical deployment is
challenging due to their ever increasing computational and memory demands. Quantization …

บันทึก อ้างอิง อ้างโดย12 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lottery ticket adaptation: Mitigating destructive interference in llms

A Panda, B Isik, X Qi, S Koyejo, T Weissman… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing methods for adapting large language models (LLMs) to new tasks are not suited to
multi-task adaptation because they modify all the model weights--causing destructive …

บันทึก อ้างอิง อ้างโดย10 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

A fine-tuning enhanced RAG system with quantized influence measure as AI judge

K Rangan, Y Yin - Scientific Reports, 2024 - nature.com

This study presents an innovative enhancement to retrieval-augmented generation (RAG)
systems by seamlessly integrating fine-tuned large language models (LLMs) with vector …

บันทึก อ้างอิง อ้างโดย9 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fast matrix multiplications for lookup table-quantized llms

H Guo, W Brandon, R Cholakov… - arxiv preprint arxiv …, 2024 - arxiv.org

The deployment of large language models (LLMs) is often constrained by memory
bandwidth, where the primary bottleneck is the cost of transferring model parameters from …

บันทึก อ้างอิง อ้างโดย9 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Lq-lora: Low-rank plus quantized matrix decomposition for efficient language model finetuning

Parameter-efficient fine-tuning for large models: A comprehensive survey

Efficientqat: Efficient quantization-aware training for large language models

Talking heads: Understanding inter-layer communication in transformer language models

Svdqunat: Absorbing outliers by low-rank components for 4-bit diffusion models

A survey of low-bit large language models: Basics, systems, and algorithms

Compressing large language models using low rank and low precision decomposition

Low-rank quantization-aware training for llms

Lottery ticket adaptation: Mitigating destructive interference in llms

A fine-tuning enhanced RAG system with quantized influence measure as AI judge

Fast matrix multiplications for lookup table-quantized llms