Gptvq: The blessing of dimensionality for llm quantization

M Van Baalen, A Kuzmin, M Nagel, P Couperus… - arxiv preprint arxiv …, 2024 - arxiv.org
In this work we show that the size versus accuracy trade-off of neural network quantization
can be significantly improved by increasing the quantization dimensionality. We propose the …

An information-theoretic perspective on variance-invariance-covariance regularization

R Shwartz-Ziv, R Balestriero, K Kawaguchi… - arxiv preprint arxiv …, 2023 - arxiv.org
Variance-Invariance-Covariance Regularization (VICReg) is a self-supervised learning
(SSL) method that has shown promising results on a variety of tasks. However, the …

An information theory perspective on variance-invariance-covariance regularization

R Shwartz-Ziv, R Balestriero… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Variance-Invariance-Covariance Regularization (VICReg) is a self-supervised
learning (SSL) method that has shown promising results on a variety of tasks. However, the …

Generalizing weather forecast to fine-grained temporal scales via physics-ai hybrid modeling

W Xu, F Ling, W Zhang, T Han, H Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Data-driven artificial intelligence (AI) models have made significant advancements in
weather forecasting, particularly in medium-range and nowcasting. However, most data …

Vq4dit: Efficient post-training vector quantization for diffusion transformers

J Deng, S Li, Z Wang, H Gu, K Xu, K Huang - arxiv preprint arxiv …, 2024 - arxiv.org
The Diffusion Transformers Models (DiTs) have transitioned the network architecture from
traditional UNets to transformers, demonstrating exceptional capabilities in image …

Flexible quantization for efficient convolutional neural networks

FG Zacchigna, S Lew, A Lutenberg - Electronics, 2024 - mdpi.com
This work focuses on the efficient quantization of convolutional neural networks (CNNs).
Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a …

Towards super compressed neural networks for object identification: Quantized low-rank tensor decomposition with self-attention

B Liu, D Wang, Q Lv, Z Han, Y Tang - Electronics, 2024 - mdpi.com
Deep convolutional neural networks have a large number of parameters and require a
significant number of floating-point operations during computation, which limits their …

Fine-grained data distribution alignment for post-training quantization

Y Zhong, M Lin, M Chen, K Li, Y Shen, F Chao… - … on Computer Vision, 2022 - Springer
While post-training quantization receives popularity mostly due to its evasion in accessing
the original complete training dataset, its poor performance also stems from scarce images …

Yono: Modeling multiple heterogeneous neural networks on microcontrollers

YD Kwon, J Chauhan, C Mascolo - 2022 21st ACM/IEEE …, 2022 - ieeexplore.ieee.org
Internet of Things (IoT) systems provide large amounts of data on all aspects of human
behavior. Machine learning techniques, especially deep neural networks (DNN), have …

Sub-8-bit quantization for on-device speech recognition: A regularization-free approach

K Zhen, M Radfar, H Nguyen, GP Strimel… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
For on-device automatic speech recognition (ASR), quantization aware training (QAT) is
ubiquitous to achieve the trade-off between model predictive performance and efficiency …