Sora: A review on background, technology, limitations, and opportunities of large vision models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …
model is trained to generate videos of realistic or imaginative scenes from text instructions …
Videomamba: State space model for efficient video understanding
Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …
Mora: Enabling generalist video generation via a multi-agent framework
Text-to-video generation has made significant strides, but replicating the capabilities of
advanced systems like OpenAI Sora remains challenging due to their closed-source nature …
advanced systems like OpenAI Sora remains challenging due to their closed-source nature …
Freelong: Training-free long video generation with spectralblend temporal attention
Video diffusion models have made substantial progress in various video generation
applications. However, training models for long video generation tasks require significant …
applications. However, training models for long video generation tasks require significant …
Anim-director: A large multimodal model powered agent for controllable animation video generation
Traditional animation generation methods depend on training generative models with
human-labelled data, entailing a sophisticated multi-stage pipeline that demands substantial …
human-labelled data, entailing a sophisticated multi-stage pipeline that demands substantial …
Compositional 3d-aware video generation with llm director
Significant progress has been made in text-to-video generation through the use of powerful
generative models and large-scale internet data. However, substantial challenges remain in …
generative models and large-scale internet data. However, substantial challenges remain in …
Progressive autoregressive video diffusion models
Current frontier video diffusion models have demonstrated remarkable results at generating
high-quality videos. However, they can only generate short video clips, normally around 10 …
high-quality videos. However, they can only generate short video clips, normally around 10 …
A survey on long video generation: Challenges, methods, and prospects
Video generation is a rapidly advancing research area, garnering significant attention due to
its broad range of applications. One critical aspect of this field is the generation of long …
its broad range of applications. One critical aspect of this field is the generation of long …
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
Visual grounding is an essential tool that links user-provided text queries with query-specific
regions within an image. Despite advancements in visual grounding models, their ability to …
regions within an image. Despite advancements in visual grounding models, their ability to …
From Sora What We Can See: A Survey of Text-to-Video Generation
With impressive achievements made, artificial intelligence is on the path forward to artificial
general intelligence. Sora, developed by OpenAI, which is capable of minute-level world …
general intelligence. Sora, developed by OpenAI, which is capable of minute-level world …