Deep reinforcement learning in computer vision: a comprehensive survey

N Le, VS Rathour, K Yamazaki, K Luu… - Artificial Intelligence …, 2022 - Springer
Deep reinforcement learning augments the reinforcement learning framework and utilizes
the powerful representation of deep neural networks. Recent works have demonstrated the …

Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlook

B Lin - Expert Systems with Applications, 2024 - Elsevier
In recent years, reinforcement learning and bandits have transformed a wide range of real-
world applications including healthcare, finance, recommendation systems, robotics, and …

V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs

P Wu, S **e - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
When we look around and perform complex tasks how we see and selectively process what
we see is crucial. However the lack of this visual search mechanism in current multimodal …

Spherical convolution empowered viewport prediction in 360 video multicast with limited FoV feedback

J Li, L Han, C Zhang, Q Li, Z Liu - ACM Transactions on Multimedia …, 2023 - dl.acm.org
Field of view (FoV) prediction is critical in 360-degree video multicast, which is a key
component of the emerging virtual reality and augmented reality applications. Most of the …

Gazeformer: Scalable, effective and fast prediction of goal-directed human attention

S Mondal, Z Yang, S Ahn, D Samaras… - Proceedings of the …, 2023 - openaccess.thecvf.com
Predicting human gaze is important in Human-Computer Interaction (HCI). However, to
practically serve HCI applications, gaze prediction models must be scalable, fast, and …

ScanDMM: A deep markov model of scanpath prediction for 360deg images

X Sui, Y Fang, H Zhu, S Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scanpath prediction for 360deg images aims to produce dynamic gaze behaviors based on
the human visual perception mechanism. Most existing scanpath prediction methods for …

[HTML][HTML] DeepGaze III: Modeling free-viewing human scanpaths with deep learning

M Kümmerer, M Bethge, TSA Wallis - Journal of Vision, 2022 - iovs.arvojournals.org
Humans typically move their eyes in “scanpaths” of fixations linked by saccades. Here we
present DeepGaze III, a new model that predicts the spatial location of consecutive fixations …

Automatic gaze analysis: A survey of deep learning based approaches

S Ghosh, A Dhall, M Hayat, J Knibbe… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Eye gaze analysis is an important research problem in the field of Computer Vision and
Human-Computer Interaction. Even with notable progress in the last 10 years, automatic …

DRIVE: Deep reinforced accident anticipation with visual explanation

W Bao, Q Yu, Y Kong - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Traffic accident anticipation aims to accurately and promptly predict the occurrence of a
future accident from dashcam videos, which is vital for a safety-guaranteed self-driving …

Predicting human scanpaths in visual question answering

X Chen, M Jiang, Q Zhao - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
Attention has been an important mechanism for both humans and computer vision systems.
While state-of-the-art models to predict attention focus on estimating a static probabilistic …