Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Visual autoregressive modeling: Scalable image generation via next-scale prediction
Abstract We present Visual AutoRegressive modeling (VAR), a new generation paradigm
that redefines the autoregressive learning on images as coarse-to-fine" next-scale …
that redefines the autoregressive learning on images as coarse-to-fine" next-scale …
Miradata: A large-scale video dataset with long durations and structured captions
Sora's high-motion intensity and long consistent videos have significantly impacted the field
of video generation, attracting unprecedented attention. However, existing publicly available …
of video generation, attracting unprecedented attention. However, existing publicly available …
Emu3: Next-token prediction is all you need
While next-token prediction is considered a promising path towards artificial general
intelligence, it has struggled to excel in multimodal tasks, which are still dominated by …
intelligence, it has struggled to excel in multimodal tasks, which are still dominated by …
Open-sora: Democratizing efficient video production for all
Vision and language are the two foundational senses for humans, and they build up our
cognitive ability and intelligence. While significant breakthroughs have been made in AI …
cognitive ability and intelligence. While significant breakthroughs have been made in AI …
Improved distribution matching distillation for fast image synthesis
Recent approaches have shown promises distilling diffusion models into efficient one-step
generators. Among them, Distribution Matching Distillation (DMD) produces one-step …
generators. Among them, Distribution Matching Distillation (DMD) produces one-step …
Genai arena: An open evaluation platform for generative models
Generative AI has made remarkable strides to revolutionize fields such as image and video
generation. These advancements are driven by innovative algorithms, architecture, and …
generation. These advancements are driven by innovative algorithms, architecture, and …
Dreamlip: Language-image pre-training with long captions
Abstract Language-image pre-training largely relies on how precisely and thoroughly a text
describes its paired image. In practice, however, the contents of an image can be so rich that …
describes its paired image. In practice, however, the contents of an image can be so rich that …
Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining
We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …
vision and language tasks, particularly excelling in generating flexible photorealistic images …
Representation alignment for generation: Training diffusion transformers is easier than you think
Recent studies have shown that the denoising process in (generative) diffusion models can
induce meaningful (discriminative) representations inside the model, though the quality of …
induce meaningful (discriminative) representations inside the model, though the quality of …
Ditfastattn: Attention compression for diffusion transformer models
Abstract Diffusion Transformers (DiT) excel at image and video generation but face
computational challenges due to the quadratic complexity of self-attention operators. We …
computational challenges due to the quadratic complexity of self-attention operators. We …