Givt: Generative infinite-vocabulary transformers
Abstract We introduce Generative Infinite-Vocabulary Transformers (GIVT) which generate
vector sequences with real-valued entries, instead of discrete tokens from a finite …
vector sequences with real-valued entries, instead of discrete tokens from a finite …
Finite scalar quantization: Vq-vae made simple
We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with
a simple scheme termed finite scalar quantization (FSQ), where we project the VAE …
a simple scheme termed finite scalar quantization (FSQ), where we project the VAE …
Controlling rate, distortion, and realism: Towards a single comprehensive neural image compression model
In recent years, neural network-driven image compression (NIC) has gained significant
attention. Some works adopt deep generative models such as GANs and diffusion models to …
attention. Some works adopt deep generative models such as GANs and diffusion models to …
Unified and scalable deep image compression framework for human and machine
Image compression aims to minimize the amount of data in image representation while
maintaining a certain visual quality for humans, which is an essential technique for storage …
maintaining a certain visual quality for humans, which is an essential technique for storage …
Semantically-Guided Image Compression for Enhanced Perceptual Quality at Extremely Low Bitrates
Image compression methods based on machine learning have achieved high rate-distortion
performance. However, the reconstructions they produce suffer from blurring at extremely …
performance. However, the reconstructions they produce suffer from blurring at extremely …
ViT transfer learning for fMRI (VTFF): A highway to achieve superior performance for multi-classification of cognitive decline
B Wang… - … Signal Processing and …, 2025 - Elsevier
Early detection of cognitive impairment is a pivotal interdisciplinary research area in
contemporary cognitive neuroscience. Researchers employ multimodal data, including brain …
contemporary cognitive neuroscience. Researchers employ multimodal data, including brain …
Continual Cross-domain Image Compression via Entropy Prior Guided Knowledge Distillation and Scalable Decoding
Learning based image compression has achieved impressive rate-distortion performance in
recent years. However, due to the disposable learning strategy and rigid network …
recent years. However, due to the disposable learning strategy and rigid network …
Robust Multiple Description Neural Video Codec with Masked Transformer for Dynamic and Noisy Networks
Multiple Description Coding (MDC) is a promising error-resilient source coding method that
is particularly suitable for dynamic networks with multiple (yet noisy and unreliable) paths …
is particularly suitable for dynamic networks with multiple (yet noisy and unreliable) paths …
Dual-Conditioned Training to Exploit Pre-trained Codebook-based Generative Model in Image Compression
Learned image compression (LIC) is increasingly gaining attention. To improve the
perceptual quality of reconstructions, generative LIC has been studied, using generative …
perceptual quality of reconstructions, generative LIC has been studied, using generative …
The Gap Between Principle and Practice of Lossy Image Coding
Lossy image coding is the art of computing that is principally bounded by the image's rate-
distortion function. This bound, though never accurately characterized, has been …
distortion function. This bound, though never accurately characterized, has been …