Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Open-magvit2: An open-source project toward democratizing auto-regressive visual generation
We present Open-MAGVIT2, a family of auto-regressive image generation models ranging
from 300M to 1.5 B. The Open-MAGVIT2 project produces an open-source replication of …
from 300M to 1.5 B. The Open-MAGVIT2 project produces an open-source replication of …
Do generative video models learn physical principles from watching videos?
AI video generation is undergoing a revolution, with quality and realism advancing rapidly.
These advances have led to a passionate scientific debate: Do video models learn``world …
These advances have led to a passionate scientific debate: Do video models learn``world …
A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities
Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …
personalization. Powered by modern AI technologies such as multimodal large language …
Multimodal Medical Code Tokenizer
Foundation models trained on patient electronic health records (EHRs) require tokenizing
medical data into sequences of discrete vocabulary items. Existing tokenizers treat medical …
medical data into sequences of discrete vocabulary items. Existing tokenizers treat medical …
InternVideo2. 5: Empowering Video MLLMs with Long and Rich Context Modeling
This paper aims to improve the performance of video multimodal large language models
(MLLM) via long and rich context (LRC) modeling. As a result, we develop a new version of …
(MLLM) via long and rich context (LRC) modeling. As a result, we develop a new version of …
DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models
Understanding and modeling lighting effects are fundamental tasks in computer vision and
graphics. Classic physically-based rendering (PBR) accurately simulates the light transport …
graphics. Classic physically-based rendering (PBR) accurately simulates the light transport …
Improving the Diffusability of Autoencoders
Latent diffusion models have emerged as the leading approach for generating high-quality
images and videos, utilizing compressed latent representations to reduce the computational …
images and videos, utilizing compressed latent representations to reduce the computational …
Goku: Flow Based Video Generative Foundation Models
This paper introduces Goku, a state-of-the-art family of joint image-and-video generation
models leveraging rectified flow Transformers to achieve industry-leading performance. We …
models leveraging rectified flow Transformers to achieve industry-leading performance. We …
Trajectory World Models for Heterogeneous Environments
Heterogeneity in sensors and actuators across environments poses a significant challenge
to building large-scale pre-trained world models on top of this low-dimensional sensor …
to building large-scale pre-trained world models on top of this low-dimensional sensor …
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation
In this paper, we propose the Dynamic Latent Frame Rate VAE (DLFR-VAE), a training-free
paradigm that can make use of adaptive temporal compression in latent space. While …
paradigm that can make use of adaptive temporal compression in latent space. While …