- Academic Search

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com

In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

保存引用被引用次数：18 相关文章所有 7 个版本 HTML 版

Target active speaker detection with audio-visual cues

Y Jiang, R Tao, Z Pan, H Li - arxiv preprint arxiv:2305.12831, 2023 - arxiv.org

In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …

保存引用被引用次数：18 相关文章所有 6 个版本 HTML 版

Loconet: Long-short context network for active speaker detection

X Wang, F Cheng, G Bertasius - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Active Speaker Detection (ASD) aims to identify who is speaking in each frame of a
video. Solving ASD involves using audio and visual information in two complementary …

保存引用被引用次数：19 相关文章所有 3 个版本 HTML 版

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

C Zhao, S Liu, K Mangalam, G Qian… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large pretrained models are increasingly crucial in modern computer vision tasks. These
models are typically used in downstream tasks by end-to-end finetuning which is highly …

保存引用被引用次数：1 相关文章所有 5 个版本 HTML 版

Egoloc: Revisiting 3d object localization from egocentric videos with visual queries

J Mai, A Hamdi, S Giancola, C Zhao… - Proceedings of the …, 2023 - openaccess.thecvf.com

With the recent advances in video and 3D understanding, novel 4D spatio-temporal
methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic …

保存引用被引用次数：14 相关文章所有 7 个版本 HTML 版

Listen to look into the future: Audio-visual egocentric gaze anticipation

B Lai, F Ryan, W Jia, M Liu, JM Rehg - European Conference on Computer …, 2024 - Springer

Egocentric gaze anticipation serves as a key building block for the emerging capability of
Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals …

保存引用被引用次数：5 相关文章所有 2 个版本

Joint audio-visual idling vehicle detection with streamlined input dependencies

X Li, R Mohammed, T Mangin, S Saha… - arxiv preprint arxiv …, 2024 - arxiv.org

Idling vehicle detection (IVD) can be helpful in monitoring and reducing unnecessary idling
and can be integrated into real-time systems to address the resulting pollution and harmful …

保存引用被引用次数：3 相关文章 HTML 版

BIAS: A Body-based Interpretable Active Speaker Approach

T Roxo, JC Costa, PRM Inácio… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

State-of-the-art Active Speaker Detection (ASD) approaches heavily rely on audio and facial
features to perform, which is not a sustainable approach in wild scenarios. Although these …

保存引用被引用次数：1 相关文章所有 2 个版本