Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Transformers in vision: A survey
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …
vision community to study their application to computer vision problems. Among their salient …
A survey on deep learning for human activity recognition
Human activity recognition is a key to a lot of applications such as healthcare and smart
home. In this study, we provide a comprehensive survey on recent advances and challenges …
home. In this study, we provide a comprehensive survey on recent advances and challenges …
Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
Photorealistic video generation with diffusion models
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
Video probabilistic diffusion models in projected latent space
Despite the remarkable progress in deep generative models, synthesizing high-resolution
and temporally coherent videos still remains a challenge due to their high-dimensionality …
and temporally coherent videos still remains a challenge due to their high-dimensionality …
Magvit: Masked generative video transformer
Abstract We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …
Learning video representations from large language models
We introduce LAVILA, a new approach to learning video-language representations by
leveraging Large Language Models (LLMs). We repurpose pre-trained LLMs to be …
leveraging Large Language Models (LLMs). We repurpose pre-trained LLMs to be …
Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
Factorizing text-to-video generation by explicit image conditioning
Abstract We present Emu Video, a text-to-video generation model that factorizes the
generation into two steps: first generating an image conditioned on the text, and then …
generation into two steps: first generating an image conditioned on the text, and then …
Amt: All-pairs multi-field transforms for efficient frame interpolation
Abstract We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for
video frame interpolation. It is based on two essential designs. First, we build bidirectional …
video frame interpolation. It is based on two essential designs. First, we build bidirectional …