SweetTokenizer: Semantic-Aware Spatial-Temporal Tokenizer for Compact Visual Discretization
This paper presents the\textbf {S} emantic-a\textbf {W} ar\textbf {E} spatial-t\textbf {E}
mporal\textbf {T} okenizer (SweetTokenizer), a compact yet effective discretization approach …
mporal\textbf {T} okenizer (SweetTokenizer), a compact yet effective discretization approach …
STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading
In financial trading, factor models are widely used to price assets and capture excess returns
from mispricing. Recently, we have witnessed the rise of variational autoencoder-based …
from mispricing. Recently, we have witnessed the rise of variational autoencoder-based …
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Efficient image tokenization with high compression ratios remains a critical challenge for
training generative models. We present SoftVQ-VAE, a continuous image tokenizer that …
training generative models. We present SoftVQ-VAE, a continuous image tokenizer that …
Diverse Code Query Learning for Speech-Driven Facial Animation
Speech-driven facial animation aims to synthesize lip-synchronized 3D talking faces
following the given speech signal. Prior methods to this task mostly focus on pursuing …
following the given speech signal. Prior methods to this task mostly focus on pursuing …
LG-VQ: Language-Guided Codebook Learning
Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image
synthesis, which aims to learn a codebook to encode an image with a sequence of discrete …
synthesis, which aims to learn a codebook to encode an image with a sequence of discrete …