„Google“ mokslinčius

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Išsaugoti Cituoti Cituoja 2977 Susiję straipsniai Visos 8 versijos

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

A survey on deep learning for human activity recognition

F Gu, MH Chung, M Chignell, S Valaee… - ACM Computing …, 2021 - dl.acm.org

Human activity recognition is a key to a lot of applications such as healthcare and smart
home. In this study, we provide a comprehensive survey on recent advances and challenges …

Išsaugoti Cituoti Cituoja 218 Susiję straipsniai Visos 3 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Išsaugoti Cituoti Cituoja 134 Susiję straipsniai Visos 11 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Photorealistic video generation with diffusion models

A Gupta, L Yu, K Sohn, X Gu, M Hahn, FF Li… - … on Computer Vision, 2024 - Springer

We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …

Išsaugoti Cituoti Cituoja 141 Susiję straipsniai Visos 10 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Video probabilistic diffusion models in projected latent space

S Yu, K Sohn, S Kim, J Shin - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Despite the remarkable progress in deep generative models, synthesizing high-resolution
and temporally coherent videos still remains a challenge due to their high-dimensionality …

Išsaugoti Cituoti Cituoja 177 Susiję straipsniai Visos 10 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Magvit: Masked generative video transformer

L Yu, Y Cheng, K Sohn, J Lezama… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …

Išsaugoti Cituoti Cituoja 220 Susiję straipsniai Visos 8 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Learning video representations from large language models

Y Zhao, I Misra, P Krähenbühl… - Proceedings of the …, 2023 - openaccess.thecvf.com

We introduce LAVILA, a new approach to learning video-language representations by
leveraging Large Language Models (LLMs). We repurpose pre-trained LLMs to be …

Išsaugoti Cituoti Cituoja 179 Susiję straipsniai Visos 8 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Išsaugoti Cituoti Cituoja 1016 Susiję straipsniai Visos 20 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Factorizing text-to-video generation by explicit image conditioning

R Girdhar, M Singh, A Brown, Q Duval, S Azadi… - … on Computer Vision, 2024 - Springer

Abstract We present Emu Video, a text-to-video generation model that factorizes the
generation into two steps: first generating an image conditioned on the text, and then …

Išsaugoti Cituoti Cituoja 73 Susiję straipsniai Visos 6 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Amt: All-pairs multi-field transforms for efficient frame interpolation

Z Li, ZL Zhu, LH Han, Q Hou… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for
video frame interpolation. It is based on two essential designs. First, we build bidirectional …

Išsaugoti Cituoti Cituoja 85 Susiję straipsniai Visos 7 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

UCF101: A dataset of 101 human actions classes from videos in the wild

Transformers in vision: A survey

A survey on deep learning for human activity recognition

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

Photorealistic video generation with diffusion models

Video probabilistic diffusion models in projected latent space

Magvit: Masked generative video transformer

Learning video representations from large language models

Ego4d: Around the world in 3,000 hours of egocentric video

Factorizing text-to-video generation by explicit image conditioning

Amt: All-pairs multi-field transforms for efficient frame interpolation