Backbones-review: Feature extraction networks for deep learning and deep reinforcement learning approaches
To understand the real world using various types of data, Artificial Intelligence (AI) is the
most used technique nowadays. While finding the pattern within the analyzed data …
most used technique nowadays. While finding the pattern within the analyzed data …
Dynamic neural networks: A survey
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …
models which have fixed computational graphs and parameters at the inference stage …
X3d: Expanding architectures for efficient video recognition
C Feichtenhofer - Proceedings of the IEEE/CVF conference …, 2020 - openaccess.thecvf.com
This paper presents X3D, a family of efficient video networks that progressively expand a
tiny 2D image classification architecture along multiple network axes, in space, time, width …
tiny 2D image classification architecture along multiple network axes, in space, time, width …
Self-chained image-language model for video localization and question answering
Recent studies have shown promising results on utilizing large pre-trained image-language
models for video question answering. While these image-language models can efficiently …
models for video question answering. While these image-language models can efficiently …
Beyond robustness: A taxonomy of approaches towards resilient multi-robot systems
Robustness is key to engineering, automation, and science as a whole. However, the
property of robustness is often underpinned by costly requirements such as over …
property of robustness is often underpinned by costly requirements such as over …
Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models
Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
Uniformerv2: Unlocking the potential of image vits for video understanding
The prolific performances of Vision Transformers (ViTs) in image tasks have prompted
research into adapting the image ViTs for video tasks. However, the substantial gap …
research into adapting the image ViTs for video tasks. However, the substantial gap …
Listen to look: Action recognition by previewing audio
In the face of the video data deluge, today's expensive clip-level classifiers are increasingly
impractical. We propose a framework for efficient action recognition in untrimmed video that …
impractical. We propose a framework for efficient action recognition in untrimmed video that …
Revisiting classifier: Transferring vision-language models for video recognition
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …
an important topic in computer vision research. Along with the growth of computational …
Smart frame selection for action recognition
Video classification is computationally expensive. In this paper, we address theproblem of
frame selection to reduce the computational cost of video classification. Recent work has …
frame selection to reduce the computational cost of video classification. Recent work has …