Deep reinforcement learning in computer vision: a comprehensive survey
Deep reinforcement learning augments the reinforcement learning framework and utilizes
the powerful representation of deep neural networks. Recent works have demonstrated the …
the powerful representation of deep neural networks. Recent works have demonstrated the …
Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlook
B Lin - Expert Systems with Applications, 2024 - Elsevier
In recent years, reinforcement learning and bandits have transformed a wide range of real-
world applications including healthcare, finance, recommendation systems, robotics, and …
world applications including healthcare, finance, recommendation systems, robotics, and …
V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs
When we look around and perform complex tasks how we see and selectively process what
we see is crucial. However the lack of this visual search mechanism in current multimodal …
we see is crucial. However the lack of this visual search mechanism in current multimodal …
Spherical convolution empowered viewport prediction in 360 video multicast with limited FoV feedback
J Li, L Han, C Zhang, Q Li, Z Liu - ACM Transactions on Multimedia …, 2023 - dl.acm.org
Field of view (FoV) prediction is critical in 360-degree video multicast, which is a key
component of the emerging virtual reality and augmented reality applications. Most of the …
component of the emerging virtual reality and augmented reality applications. Most of the …
Gazeformer: Scalable, effective and fast prediction of goal-directed human attention
Predicting human gaze is important in Human-Computer Interaction (HCI). However, to
practically serve HCI applications, gaze prediction models must be scalable, fast, and …
practically serve HCI applications, gaze prediction models must be scalable, fast, and …
ScanDMM: A deep markov model of scanpath prediction for 360deg images
Scanpath prediction for 360deg images aims to produce dynamic gaze behaviors based on
the human visual perception mechanism. Most existing scanpath prediction methods for …
the human visual perception mechanism. Most existing scanpath prediction methods for …
[HTML][HTML] DeepGaze III: Modeling free-viewing human scanpaths with deep learning
Humans typically move their eyes in “scanpaths” of fixations linked by saccades. Here we
present DeepGaze III, a new model that predicts the spatial location of consecutive fixations …
present DeepGaze III, a new model that predicts the spatial location of consecutive fixations …
Automatic gaze analysis: A survey of deep learning based approaches
Eye gaze analysis is an important research problem in the field of Computer Vision and
Human-Computer Interaction. Even with notable progress in the last 10 years, automatic …
Human-Computer Interaction. Even with notable progress in the last 10 years, automatic …
DRIVE: Deep reinforced accident anticipation with visual explanation
Traffic accident anticipation aims to accurately and promptly predict the occurrence of a
future accident from dashcam videos, which is vital for a safety-guaranteed self-driving …
future accident from dashcam videos, which is vital for a safety-guaranteed self-driving …
Predicting human scanpaths in visual question answering
Attention has been an important mechanism for both humans and computer vision systems.
While state-of-the-art models to predict attention focus on estimating a static probabilistic …
While state-of-the-art models to predict attention focus on estimating a static probabilistic …