Open-magvit2: An open-source project toward democratizing auto-regressive visual generation
We present Open-MAGVIT2, a family of auto-regressive image generation models ranging
from 300M to 1.5 B. The Open-MAGVIT2 project produces an open-source replication of …
from 300M to 1.5 B. The Open-MAGVIT2 project produces an open-source replication of …
Do generative video models learn physical principles from watching videos?
AI video generation is undergoing a revolution, with quality and realism advancing rapidly.
These advances have led to a passionate scientific debate: Do video models learn``world …
These advances have led to a passionate scientific debate: Do video models learn``world …
A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities
Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …
personalization. Powered by modern AI technologies such as multimodal large language …
Multimodal Medical Code Tokenizer
Foundation models trained on patient electronic health records (EHRs) require tokenizing
medical data into sequences of discrete vocabulary items. Existing tokenizers treat medical …
medical data into sequences of discrete vocabulary items. Existing tokenizers treat medical …
InternVideo2. 5: Empowering Video MLLMs with Long and Rich Context Modeling
This paper aims to improve the performance of video multimodal large language models
(MLLM) via long and rich context (LRC) modeling. As a result, we develop a new version of …
(MLLM) via long and rich context (LRC) modeling. As a result, we develop a new version of …
DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models
Understanding and modeling lighting effects are fundamental tasks in computer vision and
graphics. Classic physically-based rendering (PBR) accurately simulates the light transport …
graphics. Classic physically-based rendering (PBR) accurately simulates the light transport …
Improving the Diffusability of Autoencoders
Latent diffusion models have emerged as the leading approach for generating high-quality
images and videos, utilizing compressed latent representations to reduce the computational …
images and videos, utilizing compressed latent representations to reduce the computational …
Goku: Flow Based Video Generative Foundation Models
This paper introduces Goku, a state-of-the-art family of joint image-and-video generation
models leveraging rectified flow Transformers to achieve industry-leading performance. We …
models leveraging rectified flow Transformers to achieve industry-leading performance. We …
Trajectory World Models for Heterogeneous Environments
Heterogeneity in sensors and actuators across environments poses a significant challenge
to building large-scale pre-trained world models on top of this low-dimensional sensor …
to building large-scale pre-trained world models on top of this low-dimensional sensor …
Single-Channel EEG Tokenization Through Time-Frequency Modeling
We introduce TFM-Tokenizer, a novel tokenization framework tailored for EEG analysis that
transforms continuous, noisy brain signals into a sequence of discrete, well-represented …
transforms continuous, noisy brain signals into a sequence of discrete, well-represented …