Revisiting non-autoregressive transformers for efficient image synthesis
The field of image synthesis is currently flourishing due to the advancements in diffusion
models. While diffusion models have been successful their computational intensity has …
models. While diffusion models have been successful their computational intensity has …
Id-animator: Zero-shot identity-preserving human video generation
Generating high-fidelity human video with specified identities has attracted significant
attention in the content generation community. However, existing techniques struggle to …
attention in the content generation community. However, existing techniques struggle to …
Adanat: Exploring adaptive policy for token-based image generation
Recent studies have demonstrated the effectiveness of token-based methods for visual
content generation. As a representative work, non-autoregressive Transformers (NATs) are …
content generation. As a representative work, non-autoregressive Transformers (NATs) are …
Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models
This paper presents Visual CoT, a novel pipeline that leverages the reasoning capabilities of
multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought …
multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought …
Extreme image compression using fine-tuned vqgans
Recent advances in generative compression methods have demonstrated remarkable
progress in enhancing the perceptual quality of compressed data, especially in scenarios …
progress in enhancing the perceptual quality of compressed data, especially in scenarios …
PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding
X Nie, H **, Y Yan, X Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Predictive learning models which aim to predict future frames based on past observations
are crucial to constructing world models. These models need to maintain low-level …
are crucial to constructing world models. These models need to maintain low-level …
Text-Animator: Controllable Visual Text Video Generation
Video generation is a challenging yet pivotal task in various industries, such as gaming, e-
commerce, and advertising. One significant unresolved aspect within T2V is the effective …
commerce, and advertising. One significant unresolved aspect within T2V is the effective …
Enat: Rethinking spatial-temporal interactions in token-based image synthesis
Recently, token-based generation have demonstrated their effectiveness in image synthesis.
As a representative example, non-autoregressive Transformers (NATs) can generate decent …
As a representative example, non-autoregressive Transformers (NATs) can generate decent …