An overview of Human Action Recognition in sports based on Computer Vision
Abstract Human Action Recognition (HAR) is a challenging task used in sports such as
volleyball, basketball, soccer, and tennis to detect players and recognize their actions and …
volleyball, basketball, soccer, and tennis to detect players and recognize their actions and …
Action recognition based on RGB and skeleton data sets: A survey
Action recognition is a major branch of computer vision research. As a widely used
technology, action recognition has been applied to human–computer interaction, intelligent …
technology, action recognition has been applied to human–computer interaction, intelligent …
Coca: Contrastive captioners are image-text foundation models
Exploring large-scale pretrained foundation models is of significant interest in computer
vision because these models can be quickly transferred to many downstream tasks. This …
vision because these models can be quickly transferred to many downstream tasks. This …
Flamingo: a visual language model for few-shot learning
Building models that can be rapidly adapted to novel tasks using only a handful of annotated
examples is an open challenge for multimodal machine learning research. We introduce …
examples is an open challenge for multimodal machine learning research. We introduce …
Socratic models: Composing zero-shot multimodal reasoning with language
Large pretrained (eg," foundation") models exhibit distinct capabilities depending on the
domain of data they are trained on. While these domains are generic, they may only barely …
domain of data they are trained on. While these domains are generic, they may only barely …
Perceiver io: A general architecture for structured inputs & outputs
A central goal of machine learning is the development of systems that can solve many
problems in as many data domains as possible. Current architectures, however, cannot be …
problems in as many data domains as possible. Current architectures, however, cannot be …
Less is more: Clipbert for video-and-language learning via sparse sampling
The canonical approach to video-and-language learning (eg, video question answering)
dictates a neural model to learn from offline-extracted dense video features from vision …
dictates a neural model to learn from offline-extracted dense video features from vision …
Diffusion probabilistic modeling for video generation
Denoising diffusion probabilistic models are a promising new class of generative models
that mark a milestone in high-quality image generation. This paper showcases their ability to …
that mark a milestone in high-quality image generation. This paper showcases their ability to …
Revisiting the" video" in video-language understanding
What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …
single image? Building on recent progress in self-supervised image-language models, we …
Fetv: A benchmark for fine-grained evaluation of open-domain text-to-video generation
Recently, open-domain text-to-video (T2V) generation models have made remarkable
progress. However, the promising results are mainly shown by the qualitative cases of …
progress. However, the promising results are mainly shown by the qualitative cases of …