Prompting visual-language models for efficient video understanding

C Ju, T Han, K Zheng, Y Zhang, W **e - European Conference on …, 2022 - Springer
Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …

How do you do it? fine-grained action understanding with pseudo-adverbs

H Doughty, CGM Snoek - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
We aim to understand how actions are performed and identify subtle differences, such as'
fold firmly'vs.'fold gently'. To this end, we propose a method which recognizes adverbs …

Alignment-uniformity aware representation learning for zero-shot video classification

S Pu, K Zhao, M Zheng - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Most methods tackle zero-shot video classification by aligning visual-semantic
representations within seen classes, which limits generalization to unseen classes. To …

Actionhub: a large-scale action video description dataset for zero-shot action recognition

J Zhou, J Liang, KY Lin, J Yang, WS Zheng - arxiv preprint arxiv …, 2024 - arxiv.org
Zero-shot action recognition (ZSAR) aims to learn an alignment model between videos and
class descriptions of seen actions that is transferable to unseen actions. The text queries …

Tell me what you see: A zero-shot action recognition method based on natural language descriptions

V Estevam, R Laroca, H Pedrini, D Menotti - Multimedia Tools and …, 2024 - Springer
This paper presents a novel approach to Zero-Shot Action Recognition. Recent works have
explored the detection and classification of objects to obtain semantic information from …

Deconfounding causal inference for zero-shot action recognition

J Wang, Y Jiang, Y Long, X Sun… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Zero-shot action recognition (ZSAR) aims to recognize unseen action categories in the test
set without corresponding training examples. Most existing zero-shot methods follow the …

Routing evidence for unseen actions in video moment retrieval

G Wang, X Wu, Z Qin, L Shi - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org
Video moment retrieval (VMR) is a cutting-edge vision-language task locating a segment in
a video according to the query. Though the methods have achieved significant performance …

Bi-calibration networks for weakly-supervised video representation learning

F Long, T Yao, Z Qiu, X Tian, J Luo, T Mei - International Journal of …, 2023 - Springer
The leverage of large volumes of web videos paired with the query (short phrase for
searching the video) or surrounding text (long textual description, eg, video title) offers an …

Video Attribute Prototype Network: A New Perspective for Zero-Shot Video Classification

B Wang, K Zhao, H Zhao, S Pu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Video attributes, which leverage video contents to instantiate class semantics, play a critical
role in diversifying semantics in zero-shot video classification, thereby facilitating semantic …

Zero-shot action recognition from diverse object-scene compositions

C Bretti, P Mettes - arxiv preprint arxiv:2110.13479, 2021 - arxiv.org
This paper investigates the problem of zero-shot action recognition, in the setting where no
training videos with seen actions are available. For this challenging scenario, the current …