- Academic Search

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

Uložit Citovat Počet citací tohoto článku: 223 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Uložit Citovat Počet citací tohoto článku: 147 Související články Všechny verze (počet: 2)

[Free GPT-4]

[PDF] arxiv.org

Mora: Enabling generalist video generation via a multi-agent framework

Z Yuan, Y Liu, Y Cao, W Sun, H Jia, R Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Text-to-video generation has made significant strides, but replicating the capabilities of
advanced systems like OpenAI Sora remains challenging due to their closed-source nature …

Uložit Citovat Počet citací tohoto článku: 19 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Freelong: Training-free long video generation with spectralblend temporal attention

Y Lu, Y Liang, L Zhu, Y Yang - arxiv preprint arxiv:2407.19918, 2024 - arxiv.org

Video diffusion models have made substantial progress in various video generation
applications. However, training models for long video generation tasks require significant …

Uložit Citovat Počet citací tohoto článku: 13 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Anim-director: A large multimodal model powered agent for controllable animation video generation

Y Li, H Shi, B Hu, L Wang, J Zhu, J Xu, Z Zhao… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org

Traditional animation generation methods depend on training generative models with
human-labelled data, entailing a sophisticated multi-stage pipeline that demands substantial …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 3)

[Free GPT-4]

[PDF] arxiv.org

Compositional 3d-aware video generation with llm director

H Zhu, T He, A Tang, J Guo, Z Chen, J Bian - arxiv preprint arxiv …, 2024 - arxiv.org

Significant progress has been made in text-to-video generation through the use of powerful
generative models and large-scale internet data. However, substantial challenges remain in …

Uložit Citovat Počet citací tohoto článku: 5 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Progressive autoregressive video diffusion models

D **e, Z Xu, Y Hong, H Tan, D Liu, F Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Current frontier video diffusion models have demonstrated remarkable results at generating
high-quality videos. However, they can only generate short video clips, normally around 10 …

Uložit Citovat Počet citací tohoto článku: 5 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

A survey on long video generation: Challenges, methods, and prospects

C Li, D Huang, Z Lu, Y **ao, Q Pei, L Bai - arxiv preprint arxiv:2403.16407, 2024 - arxiv.org

Video generation is a rapidly advancing research area, garnering significant attention due to
its broad range of applications. One critical aspect of this field is the generation of long …

Uložit Citovat Počet citací tohoto článku: 12 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding

H Zhao, W Ge, Y Chen - arxiv preprint arxiv:2405.17104, 2024 - arxiv.org

Visual grounding is an essential tool that links user-provided text queries with query-specific
regions within an image. Despite advancements in visual grounding models, their ability to …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

From Sora What We Can See: A Survey of Text-to-Video Generation

R Sun, Y Zhang, T Shah, J Sun, S Zhang, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org

With impressive achievements made, artificial intelligence is on the path forward to artificial
general intelligence. Sora, developed by OpenAI, which is capable of minute-level world …

Uložit Citovat Počet citací tohoto článku: 12 Související články Všechny verze (počet: 3) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Vlogger: Make your dream a vlog

Sora: A review on background, technology, limitations, and opportunities of large vision models

Videomamba: State space model for efficient video understanding

Mora: Enabling generalist video generation via a multi-agent framework

Freelong: Training-free long video generation with spectralblend temporal attention

Anim-director: A large multimodal model powered agent for controllable animation video generation

Compositional 3d-aware video generation with llm director

Progressive autoregressive video diffusion models

A survey on long video generation: Challenges, methods, and prospects

LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding

From Sora What We Can See: A Survey of Text-to-Video Generation