[HTML][HTML] Graph-based deep learning for medical diagnosis and analysis: past, present and future

D Ahmedt-Aristizabal, MA Armin, S Denman, C Fookes… - Sensors, 2021 - mdpi.com
With the advances of data-driven machine learning research, a wide variety of prediction
problems have been tackled. It has become critical to explore how machine learning and …

A survey on video-based human action recognition: recent updates, datasets, challenges, and applications

P Pareek, A Thakkar - Artificial Intelligence Review, 2021 - Springer
Abstract Human Action Recognition (HAR) involves human activity monitoring task in
different areas of medical, education, entertainment, visual surveillance, video retrieval, as …

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - European conference on …, 2022 - Springer
Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

Vita-clip: Video and text adaptive clip via multimodal prompting

ST Wasim, M Naseer, S Khan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Adopting contrastive image-text pretrained models like CLIP towards video classification has
gained attention due to its cost-effectiveness and competitive performance. However, recent …

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

W Wu, X Wang, H Luo, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …

Revisiting classifier: Transferring vision-language models for video recognition

W Wu, Z Sun, W Ouyang - Proceedings of the AAAI conference on …, 2023 - ojs.aaai.org
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …

Fine-grained temporal contrastive learning for weakly-supervised temporal action localization

J Gao, M Chen, C Xu - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
We target at the task of weakly-supervised action localization (WSAL), where only video-
level action labels are available during model training. Despite the recent progress, existing …

Graph convolutional tracking

J Gao, T Zhang, C Xu - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
Tracking by siamese networks has achieved favorable performance in recent years.
However, most of existing siamese methods do not take full advantage of spatial-temporal …

A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …