Google Académico

L Chen, S Li, Q Bai, J Yang, S Jiang, Y Miao - Remote Sensing, 2021 - mdpi.com

Image classification has always been a hot research direction in the world, and the
emergence of deep learning has promoted the development of this field. Convolutional …

Guardar Citar Citado por 600 Artículos relacionados Las 5 versiones En caché

[Free GPT-4]

[PDF] thecvf.com

Distilling knowledge via knowledge review

P Chen, S Liu, H Zhao, J Jia - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Abstract Knowledge distillation transfers knowledge from the teacher network to the student
one, with the goal of greatly improving the performance of the student network. Previous …

Guardar Citar Citado por 538 Artículos relacionados Las 9 versiones Versión en HTML

[Free GPT-4]

[PDF] mlsys.org

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration

J Lin, J Tang, H Tang, S Yang… - Proceedings of …, 2024 - proceedings.mlsys.org

Large language models (LLMs) have shown excellent performance on various tasks, but the
astronomical model size raises the hardware barrier for serving (memory size) and slows …

Guardar Citar Citado por 638 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2023 - proceedings.neurips.cc

Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

Guardar Citar Citado por 270 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[PDF] mlr.press

Smoothquant: Accurate and efficient post-training quantization for large language models

G **ao, J Lin, M Seznec, H Wu… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) show excellent performance but are compute-and memory-
intensive. Quantization can reduce memory and accelerate inference. However, existing …

Guardar Citar Citado por 804 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[PDF] mlr.press

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

Guardar Citar Citado por 256 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[PDF] mlr.press

The case for 4-bit precision: k-bit inference scaling laws

T Dettmers, L Zettlemoyer - International Conference on …, 2023 - proceedings.mlr.press

Quantization methods reduce the number of bits required to represent each parameter in a
model, trading accuracy for smaller memory footprints and inference latencies. However, the …

Guardar Citar Citado por 195 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Scaling & shifting your features: A new baseline for efficient model tuning

D Lian, D Zhou, J Feng, X Wang - Advances in Neural …, 2022 - proceedings.neurips.cc

Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-
tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers …

Guardar Citar Citado por 241 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Llm-qat: Data-free quantization aware training for large language models

Z Liu, B Oguz, C Zhao, E Chang, P Stock… - arxiv preprint arxiv …, 2023 - arxiv.org

Several post-training quantization methods have been applied to large language models
(LLMs), and have been shown to perform well down to 8-bits. We find that these methods …

Guardar Citar Citado por 232 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

On-device training under 256kb memory

J Lin, L Zhu, WM Chen, WC Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

On-device training enables the model to adapt to new data collected from the sensors by
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …

Guardar Citar Citado por 230 Artículos relacionados Las 8 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Quantization and training of neural networks for efficient integer-arithmetic-only inference

[HTML][HTML] Review of image classification algorithms based on convolutional neural networks

Distilling knowledge via knowledge review

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Smoothquant: Accurate and efficient post-training quantization for large language models

Deja vu: Contextual sparsity for efficient llms at inference time

The case for 4-bit precision: k-bit inference scaling laws

Scaling & shifting your features: A new baseline for efficient model tuning

Llm-qat: Data-free quantization aware training for large language models

On-device training under 256kb memory