- Academic Search

Y Wei, D Hu, Y Tian, X Li - arxiv preprint arxiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Salva Cita Citato da 68 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Attention bottlenecks for multimodal fusion

A Nagrani, S Yang, A Arnab, A Jansen… - Advances in neural …, 2021 - proceedings.neurips.cc

Humans perceive the world by concurrently processing and fusing high-dimensional inputs
from multiple modalities such as vision and audio. Machine perception models, in stark …

Salva Cita Citato da 644 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] wiley.com Full View

A comprehensive review of recent deep learning techniques for human activity recognition

VT Le, K Tran-Trung, VT Hoang - Computational Intelligence …, 2022 - Wiley Online Library

Human action recognition is an important field in computer vision that has attracted
remarkable attention from researchers. This survey aims to provide a comprehensive …

Salva Cita Citato da 38 Articoli correlati Tutte e 10 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Slowfast networks for video recognition

C Feichtenhofer, H Fan, J Malik… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway,
operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating …

Salva Cita Citato da 4174 Articoli correlati Tutte e 11 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Forgerynet: A versatile benchmark for comprehensive forgery analysis

Y He, B Gan, S Chen, Y Zhou, G Yin… - Proceedings of the …, 2021 - openaccess.thecvf.com

The rapid progress of photorealistic synthesis techniques has reached at a critical point
where the boundary between real and manipulated images starts to blur. Thus …

Salva Cita Citato da 161 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiovisual slowfast networks for video recognition

F **ao, YJ Lee, K Grauman, J Malik… - arxiv preprint arxiv …, 2020 - arxiv.org

We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual
perception. AVSlowFast has Slow and Fast visual pathways that are deeply integrated with a …

Salva Cita Citato da 259 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Soccernet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos

A Deliege, A Cioppa, S Giancola… - Proceedings of the …, 2021 - openaccess.thecvf.com

Understanding broadcast videos is a challenging task in computer vision, as it requires
generic reasoning capabilities to appreciate the content offered by the video editing. In this …

Salva Cita Citato da 170 Articoli correlati Tutte e 15 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Learning spatio-temporal representation with local and global diffusion

Z Qiu, T Yao, CW Ngo, X Tian… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Abstract Convolutional Neural Networks (CNN) have been regarded as a powerful class of
models for visual recognition problems. Nevertheless, the convolutional filters in these …

Salva Cita Citato da 231 Articoli correlati Tutte e 11 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Tsp: Temporally-sensitive pretraining of video encoders for localization tasks

H Alwassel, S Giancola… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Due to the large memory footprint of untrimmed videos, current state-of-the-art video
localization methods operate atop precomputed video clip features. These features are …

Salva Cita Citato da 155 Articoli correlati Tutte e 11 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio visual scene-aware dialog

H Alamri, V Cartillier, A Das, J Wang… - Proceedings of the …, 2019 - openaccess.thecvf.com

We introduce the task of scene-aware dialog. Our goal is to generate a complete and natural
response to a question about a scene, given video and audio of the scene and the history of …

Salva Cita Citato da 212 Articoli correlati Tutte e 10 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

The activitynet large-scale activity recognition challenge 2018 summary

Learning in audio-visual context: A review, analysis, and new perspective

Attention bottlenecks for multimodal fusion

A comprehensive review of recent deep learning techniques for human activity recognition

Slowfast networks for video recognition

Forgerynet: A versatile benchmark for comprehensive forgery analysis

Audiovisual slowfast networks for video recognition

Soccernet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos

Learning spatio-temporal representation with local and global diffusion

Tsp: Temporally-sensitive pretraining of video encoders for localization tasks

Audio visual scene-aware dialog