- Academic Search

Z Ni, Y Wang, R Zhou, J Guo, J Hu… - Proceedings of the …, 2024 - openaccess.thecvf.com

The field of image synthesis is currently flourishing due to the advancements in diffusion
models. While diffusion models have been successful their computational intensity has …

Save Cite Cited by 11 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Id-animator: Zero-shot identity-preserving human video generation

X He, Q Liu, S Qian, X Wang, T Hu, K Cao… - arxiv preprint arxiv …, 2024 - arxiv.org

Generating high-fidelity human video with specified identities has attracted significant
attention in the content generation community. However, existing techniques struggle to …

Save Cite Cited by 29 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Adanat: Exploring adaptive policy for token-based image generation

Z Ni, Y Wang, R Zhou, R Lu, J Guo, J Hu, Z Liu… - … on Computer Vision, 2024 - Springer

Recent studies have demonstrated the effectiveness of token-based methods for visual
content generation. As a representative work, non-autoregressive Transformers (NATs) are …

Save Cite Cited by 3 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models

H Shao, S Qian, H **ao, G Song, Z Zong… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper presents Visual CoT, a novel pipeline that leverages the reasoning capabilities of
multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought …

Save Cite Cited by 30 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Extreme image compression using fine-tuned vqgans

Q Mao, T Yang, Y Zhang, Z Wang… - 2024 Data …, 2024 - ieeexplore.ieee.org

Recent advances in generative compression methods have demonstrated remarkable
progress in enhancing the perceptual quality of compressed data, especially in scenarios …

Save Cite Cited by 12 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding

X Nie, H **, Y Yan, X Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Predictive learning models which aim to predict future frames based on past observations
are crucial to constructing world models. These models need to maintain low-level …

Save Cite Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Text-Animator: Controllable Visual Text Video Generation

L Liu, Q Liu, S Qian, Y Zhou, W Zhou, H Li, L **e… - arxiv preprint arxiv …, 2024 - arxiv.org

Video generation is a challenging yet pivotal task in various industries, such as gaming, e-
commerce, and advertising. One significant unresolved aspect within T2V is the effective …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Enat: Rethinking spatial-temporal interactions in token-based image synthesis

Z Ni, Y Wang, R Zhou, Y Han, J Guo, Z Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, token-based generation have demonstrated their effectiveness in image synthesis.
As a representative example, non-autoregressive Transformers (NATs) can generate decent …

Create alert

Cite

Advanced search

Saved to My library

Strait: Non-autoregressive generation with stratified image transformer

Revisiting non-autoregressive transformers for efficient image synthesis

Id-animator: Zero-shot identity-preserving human video generation

Adanat: Exploring adaptive policy for token-based image generation

Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models

Extreme image compression using fine-tuned vqgans

PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding

Text-Animator: Controllable Visual Text Video Generation

Enat: Rethinking spatial-temporal interactions in token-based image synthesis