Global spatio-temporal synergistic topology learning for skeleton-based action recognition
Compared to RGB video-based action recognition, skeleton-based action recognition
algorithm has attracted much more attention due to being more lightweight, better …
algorithm has attracted much more attention due to being more lightweight, better …
Continuous sign language recognition through cross-modal alignment of video and text embeddings in a joint-latent space
Continuous Sign Language Recognition (CSLR) refers to the challenging problem of
recognizing sign language glosses and their temporal boundaries from weakly annotated …
recognizing sign language glosses and their temporal boundaries from weakly annotated …
An effective video transformer with synchronized spatiotemporal and spatial self-attention for action recognition
Convolutional neural networks (CNNs) have come to dominate vision-based deep neural
network structures in both image and video models over the past decade. However …
network structures in both image and video models over the past decade. However …
Human4d: A human-centric multimodal dataset for motions and immersive media
We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of
human activities simultaneously captured by a professional marker-based MoCap, a …
human activities simultaneously captured by a professional marker-based MoCap, a …