A review on explainability in multimodal deep neural nets

G Joshi, R Walambe, K Kotecha - IEEE Access, 2021 - ieeexplore.ieee.org
Artificial Intelligence techniques powered by deep neural nets have achieved much success
in several application domains, most significantly and notably in the Computer Vision …

Visualization and visual analytics approaches for image and video datasets: A survey

S Afzal, S Ghani, MM Hittawe, SF Rashid… - ACM Transactions on …, 2023 - dl.acm.org
Image and video data analysis has become an increasingly important research area with
applications in different domains such as security surveillance, healthcare, augmented and …

Multimodal few-shot learning with frozen language models

M Tsimpoukelli, JL Menick, S Cabi… - Advances in …, 2021 - proceedings.neurips.cc
When trained at sufficient scale, auto-regressive language models exhibit the notable ability
to learn a new language task after being prompted with just a few examples. Here, we …

Evaluation of text generation: A survey

A Celikyilmaz, E Clark, J Gao - arxiv preprint arxiv:2006.14799, 2020 - arxiv.org
The paper surveys evaluation methods of natural language generation (NLG) systems that
have been developed in the last few years. We group NLG evaluation methods into three …

Learning with noisy correspondence for cross-modal matching

Z Huang, G Niu, X Liu, W Ding… - Advances in Neural …, 2021 - proceedings.neurips.cc
Cross-modal matching, which aims to establish the correspondence between two different
modalities, is fundamental to a variety of tasks such as cross-modal retrieval and vision-and …

Bicro: Noisy correspondence rectification for multi-modality data via bi-directional cross-modal similarity consistency

S Yang, Z Xu, K Wang, Y You, H Yao… - Proceedings of the …, 2023 - openaccess.thecvf.com
As one of the most fundamental techniques in multimodal learning, cross-modal matching
aims to project various sensory modalities into a shared feature space. To achieve this …

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

Video description: A comprehensive survey of deep learning approaches

G Rafiq, M Rafiq, GS Choi - Artificial Intelligence Review, 2023 - Springer
Video description refers to understanding visual content and transforming that acquired
understanding into automatic textual narration. It bridges the key AI fields of computer vision …

Generative AI in mobile networks: a survey

A Karapantelakis, P Alizadeh, A Alabassi, K Dey… - Annals of …, 2024 - Springer
This paper provides a comprehensive review of recent challenges and results in the field of
generative AI with application to mobile telecommunications networks. The objective is to …

PSNet: Parallel symmetric network for video salient object detection

R Cong, W Song, J Lei, G Yue, Y Zhao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
For the video salient object detection (VSOD) task, how to excavate the information from the
appearance modality and the motion modality has always been a topic of great concern. The …