A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

Beyond supervised learning for pervasive healthcare

X Gu, F Deligianni, J Han, X Liu, W Chen… - IEEE Reviews in …, 2023 - ieeexplore.ieee.org
The integration of machine/deep learning and sensing technologies is transforming
healthcare and medical practice. However, inherent limitations in healthcare data, namely …

TCTrack: Temporal contexts for aerial tracking

Z Cao, Z Huang, L Pan, S Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Temporal contexts among consecutive frames are far from being fully utilized in existing
visual trackers. In this work, we present TCTrack, a comprehensive framework to fully exploit …

Disentangling spatial and temporal learning for efficient image-to-video transfer learning

Z Qing, S Zhang, Z Huang, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, large-scale pre-trained language-image models like CLIP have shown
extraordinary capabilities for understanding spatial contents, but naively transferring such …

Inherent redundancy in spiking neural networks

M Yao, J Hu, G Zhao, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Spiking Neural Networks (SNNs) are well known as a promising energy-efficient
alternative to conventional artificial neural networks. Subject to the preconceived impression …

Towards real-world visual tracking with temporal contexts

Z Cao, Z Huang, L Pan, S Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual tracking has made significant improvements in the past few decades. Most existing
state-of-the-art trackers 1) merely aim for performance in ideal conditions while overlooking …

Mar: Masked autoencoders for efficient action recognition

Z Qing, S Zhang, Z Huang, X Wang… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Standard approaches for video action recognition usually operate on full input videos, which
is inefficient due to the widespread spatio-temporal redundancy in videos. The recent …

Spike-based dynamic computing with asynchronous sensing-computing neuromorphic chip

M Yao, O Richter, G Zhao, N Qiao, Y **ng… - Nature …, 2024 - nature.com
By mimicking the neurons and synapses of the human brain and employing spiking neural
networks on neuromorphic chips, neuromorphic computing offers a promising energy …

Cdc-yolofusion: Leveraging cross-scale dynamic convolution fusion for visible-infrared object detection

Z Wang, X Liao, J Yuan, Y Yao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Feature-level fusion methods have demonstrated superior performance for visible-infrared
object detection due to the deep exploration of visible and infrared features. However, most …

Transformer meets remote sensing video detection and tracking: A comprehensive survey

L Jiao, X Zhang, X Liu, F Liu, S Yang… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
Transformer has shown excellent performance in remote sensing field with long-range
modeling capabilities. Remote sensing video (RSV) moving object detection and tracking …