Google Učenjak

Z Luo, F Shi, Y Ge, Y Yang, L Wang, Y Shan - arxiv preprint arxiv …, 2024 - arxiv.org

We present Open-MAGVIT2, a family of auto-regressive image generation models ranging
from 300M to 1.5 B. The Open-MAGVIT2 project produces an open-source replication of …

Shrani Navedi Navedeno v 30 virih Sorodni članki Vse različice: 3 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Do generative video models learn physical principles from watching videos?

S Motamed, L Culp, K Swersky, P Jaini… - arxiv preprint arxiv …, 2025 - arxiv.org

AI video generation is undergoing a revolution, with quality and realism advancing rapidly.
These advances have led to a passionate scientific debate: Do video models learn``world …

Shrani Navedi Navedeno v 3 virih Sorodni članki Vse različice: 2 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

Y Liu, X Cao, T Chen, Y Jiang, J You, M Wu… - arxiv preprint arxiv …, 2025 - arxiv.org

Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …

Shrani Navedi Navedeno v 1 virih Sorodni članki Vse različice: 5 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Multimodal Medical Code Tokenizer

X Su, S Messica, Y Huang, R Johnson, L Fesser… - arxiv preprint arxiv …, 2025 - arxiv.org

Foundation models trained on patient electronic health records (EHRs) require tokenizing
medical data into sequences of discrete vocabulary items. Existing tokenizers treat medical …

Shrani Navedi Sorodni članki Vse različice: 2 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

InternVideo2. 5: Empowering Video MLLMs with Long and Rich Context Modeling

Y Wang, X Li, Z Yan, Y He, J Yu, X Zeng… - arxiv preprint arxiv …, 2025 - arxiv.org

This paper aims to improve the performance of video multimodal large language models
(MLLM) via long and rich context (LRC) modeling. As a result, we develop a new version of …

Shrani Navedi Sorodni članki Vse različice: 2 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models

R Liang, Z Gojcic, H Ling, J Munkberg… - arxiv preprint arxiv …, 2025 - arxiv.org

Understanding and modeling lighting effects are fundamental tasks in computer vision and
graphics. Classic physically-based rendering (PBR) accurately simulates the light transport …

Shrani Navedi Sorodni članki Vse različice: 2 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Improving the Diffusability of Autoencoders

I Skorokhodov, S Girish, B Hu, W Menapace… - arxiv preprint arxiv …, 2025 - arxiv.org

Latent diffusion models have emerged as the leading approach for generating high-quality
images and videos, utilizing compressed latent representations to reduce the computational …

Shrani Navedi Sorodni članki Vse različice: 2 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Goku: Flow Based Video Generative Foundation Models

S Chen, C Ge, Y Zhang, Y Zhang, F Zhu… - arxiv preprint arxiv …, 2025 - arxiv.org

This paper introduces Goku, a state-of-the-art family of joint image-and-video generation
models leveraging rectified flow Transformers to achieve industry-leading performance. We …

Shrani Navedi Sorodni članki Vse različice: 2 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Trajectory World Models for Heterogeneous Environments

S Yin, J Wu, S Huang, X Su, X He, J Hao… - arxiv preprint arxiv …, 2025 - arxiv.org

Heterogeneity in sensors and actuators across environments poses a significant challenge
to building large-scale pre-trained world models on top of this low-dimensional sensor …

Shrani Navedi Sorodni članki Vse različice: 2 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Single-Channel EEG Tokenization Through Time-Frequency Modeling

J Pradeepkumar, X Piao, Z Chen, J Sun - arxiv preprint arxiv:2502.16060, 2025 - arxiv.org

We introduce TFM-Tokenizer, a novel tokenization framework tailored for EEG analysis that
transforms continuous, noisy brain signals into a sequence of discrete, well-represented …

Shrani Navedi Sorodni članki Vse različice: 2 V obliki HTML

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

Cosmos world foundation model platform for physical ai

Open-magvit2: An open-source project toward democratizing auto-regressive visual generation

Do generative video models learn physical principles from watching videos?

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

Multimodal Medical Code Tokenizer

InternVideo2. 5: Empowering Video MLLMs with Long and Rich Context Modeling

DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Improving the Diffusability of Autoencoders

Goku: Flow Based Video Generative Foundation Models

Trajectory World Models for Heterogeneous Environments

Single-Channel EEG Tokenization Through Time-Frequency Modeling