Академия Google

Z Luo, F Shi, Y Ge, Y Yang, L Wang, Y Shan - arxiv preprint arxiv …, 2024 - arxiv.org

We present Open-MAGVIT2, a family of auto-regressive image generation models ranging
from 300M to 1.5 B. The Open-MAGVIT2 project produces an open-source replication of …

Сохранить Цитировать Цитируется: 28 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Do generative video models learn physical principles from watching videos?

S Motamed, L Culp, K Swersky, P Jaini… - arxiv preprint arxiv …, 2025 - arxiv.org

AI video generation is undergoing a revolution, with quality and realism advancing rapidly.
These advances have led to a passionate scientific debate: Do video models learn``world …

Сохранить Цитировать Цитируется: 3 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

Y Liu, X Cao, T Chen, Y Jiang, J You, M Wu… - arxiv preprint arxiv …, 2025 - arxiv.org

Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …

Сохранить Цитировать Цитируется: 1 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multimodal Medical Code Tokenizer

X Su, S Messica, Y Huang, R Johnson, L Fesser… - arxiv preprint arxiv …, 2025 - arxiv.org

Foundation models trained on patient electronic health records (EHRs) require tokenizing
medical data into sequences of discrete vocabulary items. Existing tokenizers treat medical …

Сохранить Цитировать Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

InternVideo2. 5: Empowering Video MLLMs with Long and Rich Context Modeling

Y Wang, X Li, Z Yan, Y He, J Yu, X Zeng… - arxiv preprint arxiv …, 2025 - arxiv.org

This paper aims to improve the performance of video multimodal large language models
(MLLM) via long and rich context (LRC) modeling. As a result, we develop a new version of …

Сохранить Цитировать Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models

R Liang, Z Gojcic, H Ling, J Munkberg… - arxiv preprint arxiv …, 2025 - arxiv.org

Understanding and modeling lighting effects are fundamental tasks in computer vision and
graphics. Classic physically-based rendering (PBR) accurately simulates the light transport …

Сохранить Цитировать Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving the Diffusability of Autoencoders

I Skorokhodov, S Girish, B Hu, W Menapace… - arxiv preprint arxiv …, 2025 - arxiv.org

Latent diffusion models have emerged as the leading approach for generating high-quality
images and videos, utilizing compressed latent representations to reduce the computational …

Сохранить Цитировать Похожие статьи В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Goku: Flow Based Video Generative Foundation Models

S Chen, C Ge, Y Zhang, Y Zhang, F Zhu… - arxiv preprint arxiv …, 2025 - arxiv.org

This paper introduces Goku, a state-of-the-art family of joint image-and-video generation
models leveraging rectified flow Transformers to achieve industry-leading performance. We …

Сохранить Цитировать Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Trajectory World Models for Heterogeneous Environments

S Yin, J Wu, S Huang, X Su, X He, J Hao… - arxiv preprint arxiv …, 2025 - arxiv.org

Heterogeneity in sensors and actuators across environments poses a significant challenge
to building large-scale pre-trained world models on top of this low-dimensional sensor …

Сохранить Цитировать Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation

Z Yuan, S Wang, R **e, H Zhang, T Fang… - arxiv preprint arxiv …, 2025 - arxiv.org

In this paper, we propose the Dynamic Latent Frame Rate VAE (DLFR-VAE), a training-free
paradigm that can make use of adaptive temporal compression in latent space. While …

Сохранить Цитировать Похожие статьи В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Cosmos world foundation model platform for physical ai

Open-magvit2: An open-source project toward democratizing auto-regressive visual generation

Do generative video models learn physical principles from watching videos?

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

Multimodal Medical Code Tokenizer

InternVideo2. 5: Empowering Video MLLMs with Long and Rich Context Modeling

DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Improving the Diffusability of Autoencoders

Goku: Flow Based Video Generative Foundation Models

Trajectory World Models for Heterogeneous Environments

DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation