Givt: Generative infinite-vocabulary transformers

M Tschannen, C Eastwood, F Mentzer - European Conference on …, 2024 - Springer
Abstract We introduce Generative Infinite-Vocabulary Transformers (GIVT) which generate
vector sequences with real-valued entries, instead of discrete tokens from a finite …

Finite scalar quantization: Vq-vae made simple

F Mentzer, D Minnen, E Agustsson… - arxiv preprint arxiv …, 2023 - arxiv.org
We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with
a simple scheme termed finite scalar quantization (FSQ), where we project the VAE …

Controlling rate, distortion, and realism: Towards a single comprehensive neural image compression model

S Iwai, T Miyazaki, S Omachi - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
In recent years, neural network-driven image compression (NIC) has gained significant
attention. Some works adopt deep generative models such as GANs and diffusion models to …

Unified and scalable deep image compression framework for human and machine

G Zhang, X Zhang, L Tang - ACM Transactions on Multimedia …, 2024 - dl.acm.org
Image compression aims to minimize the amount of data in image representation while
maintaining a certain visual quality for humans, which is an essential technique for storage …

Semantically-Guided Image Compression for Enhanced Perceptual Quality at Extremely Low Bitrates

S Iwai, T Miyazaki, S Omachi - IEEE Access, 2024 - ieeexplore.ieee.org
Image compression methods based on machine learning have achieved high rate-distortion
performance. However, the reconstructions they produce suffer from blurring at extremely …

ViT transfer learning for fMRI (VTFF): A highway to achieve superior performance for multi-classification of cognitive decline

B Wang… - … Signal Processing and …, 2025 - Elsevier
Early detection of cognitive impairment is a pivotal interdisciplinary research area in
contemporary cognitive neuroscience. Researchers employ multimodal data, including brain …

Continual Cross-domain Image Compression via Entropy Prior Guided Knowledge Distillation and Scalable Decoding

C Wu, Q Wu, R Ma, KN Ngan, H Li… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Learning based image compression has achieved impressive rate-distortion performance in
recent years. However, due to the disposable learning strategy and rigid network …

Robust Multiple Description Neural Video Codec with Masked Transformer for Dynamic and Noisy Networks

X Hu, W Ye, J Tang, E Ramadan, ZL Zhang - arxiv preprint arxiv …, 2024 - arxiv.org
Multiple Description Coding (MDC) is a promising error-resilient source coding method that
is particularly suitable for dynamic networks with multiple (yet noisy and unreliable) paths …

Dual-Conditioned Training to Exploit Pre-trained Codebook-based Generative Model in Image Compression

S Iwai, T Miyazaki, S Omachi - IEEE Access, 2024 - ieeexplore.ieee.org
Learned image compression (LIC) is increasingly gaining attention. To improve the
perceptual quality of reconstructions, generative LIC has been studied, using generative …

The Gap Between Principle and Practice of Lossy Image Coding

H Zhang, D Liu - arxiv preprint arxiv:2501.12330, 2025 - arxiv.org
Lossy image coding is the art of computing that is principally bounded by the image's rate-
distortion function. This bound, though never accurately characterized, has been …