[HTML][HTML] Review of image classification algorithms based on convolutional neural networks
L Chen, S Li, Q Bai, J Yang, S Jiang, Y Miao - Remote Sensing, 2021 - mdpi.com
Image classification has always been a hot research direction in the world, and the
emergence of deep learning has promoted the development of this field. Convolutional …
emergence of deep learning has promoted the development of this field. Convolutional …
Distilling knowledge via knowledge review
Abstract Knowledge distillation transfers knowledge from the teacher network to the student
one, with the goal of greatly improving the performance of the student network. Previous …
one, with the goal of greatly improving the performance of the student network. Previous …
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration
Large language models (LLMs) have shown excellent performance on various tasks, but the
astronomical model size raises the hardware barrier for serving (memory size) and slows …
astronomical model size raises the hardware barrier for serving (memory size) and slows …
H2o: Heavy-hitter oracle for efficient generative inference of large language models
Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
Smoothquant: Accurate and efficient post-training quantization for large language models
Large language models (LLMs) show excellent performance but are compute-and memory-
intensive. Quantization can reduce memory and accelerate inference. However, existing …
intensive. Quantization can reduce memory and accelerate inference. However, existing …
Deja vu: Contextual sparsity for efficient llms at inference time
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …
wave of exciting AI applications. However, they are computationally expensive at inference …
The case for 4-bit precision: k-bit inference scaling laws
Quantization methods reduce the number of bits required to represent each parameter in a
model, trading accuracy for smaller memory footprints and inference latencies. However, the …
model, trading accuracy for smaller memory footprints and inference latencies. However, the …
Scaling & shifting your features: A new baseline for efficient model tuning
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-
tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers …
tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers …
Llm-qat: Data-free quantization aware training for large language models
Several post-training quantization methods have been applied to large language models
(LLMs), and have been shown to perform well down to 8-bits. We find that these methods …
(LLMs), and have been shown to perform well down to 8-bits. We find that these methods …
On-device training under 256kb memory
On-device training enables the model to adapt to new data collected from the sensors by
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …