Spatial relationship recognition via heterogeneous representation: A review

Y Wang, H Peng, Y **ong, H Song - Neurocomputing, 2023 - Elsevier
Spatial relationship between objects in an image can help to gain a deep understanding of
the image. At present, spatial relationship recognition has received more and more …

Era: Expert retrieval and assembly for early action prediction

LG Foo, T Li, H Rahmani, Q Ke, J Liu - European Conference on Computer …, 2022 - Springer
Early action prediction aims to successfully predict the class label of an action before it is
completely performed. This is a challenging task because the beginning stages of different …

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

R Liao, M Erler, H Wang, G Zhai, G Zhang, Y Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
In the video-language domain, recent works in leveraging zero-shot Large Language Model-
based reasoning for video understanding have become competitive challengers to previous …

Cognitive accident prediction in driving scenes: A multimodality benchmark

J Fang, LL Li, K Yang, Z Zheng, J Xue… - arxiv preprint arxiv …, 2022 - arxiv.org
Traffic accident prediction in driving videos aims to provide an early warning of the accident
occurrence, and supports the decision making of safe driving systems. Previous works …

Magi-net: Meta negative network for early activity prediction

W Wang, F Chang, J Zhang, R Yan… - … on Image Processing, 2023 - ieeexplore.ieee.org
Early activity prediction/recognition aims to recognize action categories before they are fully
conveyed. Compared to full-length action sequences, partial video sequences only provide …

Bi-calibration networks for weakly-supervised video representation learning

F Long, T Yao, Z Qiu, X Tian, J Luo, T Mei - International Journal of …, 2023 - Springer
The leverage of large volumes of web videos paired with the query (short phrase for
searching the video) or surrounding text (long textual description, eg, video title) offers an …

Visual spatial description: Controlled spatial-oriented image-to-text generation

Y Zhao, J Wei, Z Lin, Y Sun, M Zhang… - arxiv preprint arxiv …, 2022 - arxiv.org
Image-to-text tasks, such as open-ended image captioning and controllable image
description, have received extensive attention for decades. Here, we further advance this …

Ambiguousness-aware state evolution for action prediction

L Chen, J Lu, Z Song, J Zhou - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
In this paper, we propose an ambiguousness-aware state evolution (AASE) method which
represents the uncertainty of the input sequence and evolves the subsequent skeletons to …

[HTML][HTML] A Dynamic Position Embedding-Based Model for Student Classroom Complete Meta-Action Recognition

Z Shou, X Yuan, D Li, J Mo, H Zhang, J Zhang, Z Wu - Sensors, 2024 - mdpi.com
The precise recognition of entire classroom meta-actions is a crucial challenge for the
tailored adaptive interpretation of student behavior, given the intricacy of these actions. This …

Interpretable deep feature propagation for early action recognition

H Zhao, RP Wildes - arxiv preprint arxiv:2107.05122, 2021 - arxiv.org
Early action recognition (action prediction) from limited preliminary observations plays a
critical role for streaming vision systems that demand real-time inference, as video actions …