- Academic Search

X Wang, X Zhang, Z Luo, Q Sun, Y Cui, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

While next-token prediction is considered a promising path towards artificial general
intelligence, it has struggled to excel in multimodal tasks, which are still dominated by …

Save Cite Cited by 66 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

X He, D Jiang, G Zhang, M Ku, A Soni, S Siu… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …

Save Cite Cited by 23 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Pyramidal flow matching for efficient video generative modeling

Y **, Z Sun, N Li, K Xu, H Jiang, N Zhuang… - arxiv preprint arxiv …, 2024 - arxiv.org

Video generation requires modeling a vast spatiotemporal space, which demands
significant computational resources and data usage. To reduce the complexity, the …

Save Cite Cited by 19 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

T2v-turbo-v2: Enhancing video generation model post-training through data, reward, and conditional guidance design

J Li, Q Long, J Zheng, X Gao, R Piramuthu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the
post-training phase by distilling a highly capable consistency model from a pretrained T2V …

Save Cite Cited by 7 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Improving dynamic object interactions in text-to-video generation with ai feedback

H Furuta, H Zen, D Schuurmans, A Faust… - arxiv preprint arxiv …, 2024 - arxiv.org

Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …

Save Cite Cited by 2 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Videodpo: Omni-preference alignment for video diffusion generation

R Liu, H Wu, Z Ziqiang, C Wei, Y He, R Pi… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent progress in generative diffusion models has greatly advanced text-to-video
generation. While text-to-video models trained on large-scale, diverse datasets can produce …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

From slow bidirectional to fast causal video generators

T Yin, Q Zhang, R Zhang, WT Freeman… - arxiv preprint arxiv …, 2024 - arxiv.org

Current video diffusion models achieve impressive generation quality but struggle in
interactive applications due to bidirectional attention dependencies. The generation of a …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Onlinevpo: Align video diffusion model with online video-centric preference optimization

J Zhang, J Wu, W Chen, Y Ji, X **ao, W Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, the field of text-to-video (T2V) generation has made significant strides.
Despite this progress, there is still a gap between theoretical advancements and practical …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Lift: Leveraging human feedback for text-to-video model alignment

Y Wang, Z Tan, J Wang, X Yang, C **, H Li - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in text-to-video (T2V) generative models have shown impressive
capabilities. However, these models are still inadequate in aligning synthesized videos with …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

W Feng, C Liu, S Liu, WY Wang, A Vahdat… - arxiv preprint arxiv …, 2025 - arxiv.org

Existing video generation models struggle to follow complex text prompts and synthesize
multiple objects, raising the need for additional grounding input for improved controllability …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Emu3: Next-token prediction is all you need

Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

Pyramidal flow matching for efficient video generative modeling

T2v-turbo-v2: Enhancing video generation model post-training through data, reward, and conditional guidance design

Improving dynamic object interactions in text-to-video generation with ai feedback

Videodpo: Omni-preference alignment for video diffusion generation

From slow bidirectional to fast causal video generators

Onlinevpo: Align video diffusion model with online video-centric preference optimization

Lift: Leveraging human feedback for text-to-video model alignment

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations