Benchmark evaluations, applications, and challenges of large vision language models: A survey
Multimodal Vision Language Models (VLMs) have emerged as a transformative technology
at the intersection of computer vision and natural language processing, enabling machines …
at the intersection of computer vision and natural language processing, enabling machines …
: A Vision-Language-Action Flow Model for General Robot Control
Robot learning holds tremendous promise to unlock the full potential of flexible, general, and
dexterous robot systems, as well as to address some of the deepest questions in artificial …
dexterous robot systems, as well as to address some of the deepest questions in artificial …
Open-sora plan: Open-source large video generation model
We introduce Open-Sora Plan, an open-source project that aims to contribute a large
generation model for generating desired high-resolution videos with long durations based …
generation model for generating desired high-resolution videos with long durations based …
Provably optimal memory capacity for modern hopfield models: Transformer-compatible dense associative memories as spherical codes
We study the optimal memorization capacity of modern Hopfield models and Kernelized
Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories …
Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories …
Inference-time scaling for diffusion models beyond scaling denoising steps
Generative models have made significant impacts across various domains, largely due to
their ability to scale during training by increasing data, computational resources, and model …
their ability to scale during training by increasing data, computational resources, and model …
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with
consistent human identity. It is an important task in video generation but remains an open …
consistent human identity. It is an important task in video generation but remains an open …
Recapture: Generative video camera controls for user-provided videos using masked video fine-tuning
Recently, breakthroughs in video modeling have allowed for controllable camera trajectories
in generated videos. However, these methods cannot be directly applied to user-provided …
in generated videos. However, these methods cannot be directly applied to user-provided …
Motion Prompting: Controlling Video Generation with Motion Trajectories
Motion control is crucial for generating expressive and compelling video content; however,
most existing video generation models rely mainly on text prompts for control, which struggle …
most existing video generation models rely mainly on text prompts for control, which struggle …
Navigation world models
Navigation is a fundamental skill of agents with visual-motor capabilities. We introduce a
Navigation World Model (NWM), a controllable video generation model that predicts future …
Navigation World Model (NWM), a controllable video generation model that predicts future …
Ltx-video: Realtime video latent diffusion
We introduce LTX-Video, a transformer-based latent diffusion model that adopts a holistic
approach to video generation by seamlessly integrating the responsibilities of the Video …
approach to video generation by seamlessly integrating the responsibilities of the Video …