Sora: A review on background, technology, limitations, and opportunities of large vision models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer
Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Mora: Enabling generalist video generation via a multi-agent framework

Z Yuan, Y Liu, Y Cao, W Sun, H Jia, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-video generation has made significant strides, but replicating the capabilities of
advanced systems like OpenAI Sora remains challenging due to their closed-source nature …

Freelong: Training-free long video generation with spectralblend temporal attention

Y Lu, Y Liang, L Zhu, Y Yang - arxiv preprint arxiv:2407.19918, 2024 - arxiv.org
Video diffusion models have made substantial progress in various video generation
applications. However, training models for long video generation tasks require significant …

Anim-director: A large multimodal model powered agent for controllable animation video generation

Y Li, H Shi, B Hu, L Wang, J Zhu, J Xu, Z Zhao… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
Traditional animation generation methods depend on training generative models with
human-labelled data, entailing a sophisticated multi-stage pipeline that demands substantial …

Compositional 3d-aware video generation with llm director

H Zhu, T He, A Tang, J Guo, Z Chen, J Bian - arxiv preprint arxiv …, 2024 - arxiv.org
Significant progress has been made in text-to-video generation through the use of powerful
generative models and large-scale internet data. However, substantial challenges remain in …

Progressive autoregressive video diffusion models

D **e, Z Xu, Y Hong, H Tan, D Liu, F Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Current frontier video diffusion models have demonstrated remarkable results at generating
high-quality videos. However, they can only generate short video clips, normally around 10 …

A survey on long video generation: Challenges, methods, and prospects

C Li, D Huang, Z Lu, Y **ao, Q Pei, L Bai - arxiv preprint arxiv:2403.16407, 2024 - arxiv.org
Video generation is a rapidly advancing research area, garnering significant attention due to
its broad range of applications. One critical aspect of this field is the generation of long …

LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding

H Zhao, W Ge, Y Chen - arxiv preprint arxiv:2405.17104, 2024 - arxiv.org
Visual grounding is an essential tool that links user-provided text queries with query-specific
regions within an image. Despite advancements in visual grounding models, their ability to …

From Sora What We Can See: A Survey of Text-to-Video Generation

R Sun, Y Zhang, T Shah, J Sun, S Zhang, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org
With impressive achievements made, artificial intelligence is on the path forward to artificial
general intelligence. Sora, developed by OpenAI, which is capable of minute-level world …