Robotic vision for human-robot interaction and collaboration: A survey and systematic review

N Robinson, B Tidd, D Campbell, D Kulić… - ACM Transactions on …, 2023 - dl.acm.org
Robotic vision, otherwise known as computer vision for robots, is a critical process for robots
to collect and interpret detailed information related to human actions, goals, and …

Audiovisual fusion: Challenges and new approaches

AK Katsaggelos, S Bahaadini… - Proceedings of the …, 2015 - ieeexplore.ieee.org
In this paper, we review recent results on audiovisual (AV) fusion. We also discuss some of
the challenges and report on approaches to address them. One important issue in AV fusion …

Unicon: Unified context network for robust active speaker detection

Y Zhang, S Liang, S Yang, X Liu, Z Wu, S Shan… - Proceedings of the 29th …, 2021 - dl.acm.org
We propose a new efficient framework, the Unified Context Network (UniCon), for robust
active speaker detection (ASD). Traditional methods for ASD usually operate on each …

Co-localization of audio sources in images using binaural features and locally-linear regression

A Deleforge, R Horaud… - … /ACM Transactions on …, 2015 - ieeexplore.ieee.org
This paper addresses the problem of localizing audio sources using binaural
measurements. We propose a supervised formulation that simultaneously localizes multiple …

ChildBot: Multi-robot perception and interaction with children

N Efthymiou, PP Filntisis, P Koutras, A Tsiami… - Robotics and …, 2022 - Elsevier
In this paper, we present an integrated robotic system capable of participating in and
performing a wide range of educational and entertainment tasks collaborating with one or …

Who's speaking? Audio-supervised classification of active speakers in video

P Chakravarty, S Mirzaei, T Tuytelaars… - Proceedings of the …, 2015 - dl.acm.org
Active speakers have traditionally been identified in video by detecting their moving lips.
This paper demonstrates the same using spatio-temporal features that aim to capture other …

Mixture of inference networks for VAE-based audio-visual speech enhancement

M Sadeghi, X Alameda-Pineda - IEEE Transactions on Signal …, 2021 - ieeexplore.ieee.org
We address unsupervised audio-visual speech enhancement based on variational
autoencoders (VAEs), where the prior distribution of clean speech spectrogram is simulated …

[HTML][HTML] Prediction of who will be next speaker and when using mouth-opening pattern in multi-party conversation

R Ishii, K Otsuka, S Kumano, R Higashinaka… - Multimodal …, 2019 - mdpi.com
We investigated the mouth-opening transition pattern (MOTP), which represents the change
of mouth-opening degree during the end of an utterance, and used it to predict the next …

Ava (a social robot): Design and performance of a robotic hearing apparatus

E Saffari, A Meghdari, B Vazirnezhad… - Social Robotics: 7th …, 2015 - Springer
Socially cognitive robots are supposed to communicate and interact with humans and other
robots in the most natural way. Listeners turn their heads to-ward speakers to enhance …