Google Akademik

Z Li, X Wu, H Du, H Nghiem, G Shi - arxiv preprint arxiv:2501.02189, 2025 - arxiv.org

Multimodal Vision Language Models (VLMs) have emerged as a transformative technology
at the intersection of computer vision and natural language processing, enabling machines …

Kaydet Alıntı yap Alıntılanma sayısı: 3 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

: A Vision-Language-Action Flow Model for General Robot Control

K Black, N Brown, D Driess, A Esmail, M Equi… - arxiv preprint arxiv …, 2024 - arxiv.org

Robot learning holds tremendous promise to unlock the full potential of flexible, general, and
dexterous robot systems, as well as to address some of the deepest questions in artificial …

Kaydet Alıntı yap Alıntılanma sayısı: 15 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Open-sora plan: Open-source large video generation model

B Lin, Y Ge, X Cheng, Z Li, B Zhu, S Wang, X He… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Open-Sora Plan, an open-source project that aims to contribute a large
generation model for generating desired high-resolution videos with long durations based …

Kaydet Alıntı yap Alıntılanma sayısı: 9 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Provably optimal memory capacity for modern hopfield models: Transformer-compatible dense associative memories as spherical codes

JYC Hu, D Wu, H Liu - arxiv preprint arxiv:2410.23126, 2024 - arxiv.org

We study the optimal memorization capacity of modern Hopfield models and Kernelized
Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories …

Kaydet Alıntı yap Alıntılanma sayısı: 10 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Inference-time scaling for diffusion models beyond scaling denoising steps

N Ma, S Tong, H Jia, H Hu, YC Su, M Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org

Generative models have made significant impacts across various domains, largely due to
their ability to scale during training by increasing data, computational resources, and model …

Kaydet Alıntı yap Alıntılanma sayısı: 5 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

S Yuan, J Huang, X He, Y Ge, Y Shi, L Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with
consistent human identity. It is an important task in video generation but remains an open …

Kaydet Alıntı yap Alıntılanma sayısı: 4 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Recapture: Generative video camera controls for user-provided videos using masked video fine-tuning

DJ Zhang, R Paiss, S Zada, N Karnad… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, breakthroughs in video modeling have allowed for controllable camera trajectories
in generated videos. However, these methods cannot be directly applied to user-provided …

Kaydet Alıntı yap Alıntılanma sayısı: 4 İlgili makaleler HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Motion Prompting: Controlling Video Generation with Motion Trajectories

D Geng, C Herrmann, J Hur, F Cole, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Motion control is crucial for generating expressive and compelling video content; however,
most existing video generation models rely mainly on text prompts for control, which struggle …

Kaydet Alıntı yap Alıntılanma sayısı: 2 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Navigation world models

A Bar, G Zhou, D Tran, T Darrell, Y LeCun - arxiv preprint arxiv …, 2024 - arxiv.org

Navigation is a fundamental skill of agents with visual-motor capabilities. We introduce a
Navigation World Model (NWM), a controllable video generation model that predicts future …

Kaydet Alıntı yap Alıntılanma sayısı: 2 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Ltx-video: Realtime video latent diffusion

Y HaCohen, N Chiprut, B Brazowski, D Shalem… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce LTX-Video, a transformer-based latent diffusion model that adopts a holistic
approach to video generation by seamlessly integrating the responsibilities of the Video …

Kaydet Alıntı yap Alıntılanma sayısı: 3 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Movie gen: A cast of media foundation models

Benchmark evaluations, applications, and challenges of large vision language models: A survey

: A Vision-Language-Action Flow Model for General Robot Control

Open-sora plan: Open-source large video generation model

Provably optimal memory capacity for modern hopfield models: Transformer-compatible dense associative memories as spherical codes

Inference-time scaling for diffusion models beyond scaling denoising steps

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Recapture: Generative video camera controls for user-provided videos using masked video fine-tuning

Motion Prompting: Controlling Video Generation with Motion Trajectories

Navigation world models

Ltx-video: Realtime video latent diffusion