How to design a three-stage architecture for audio-visual active speaker detection in the wild

O Köpüklü, M Taseska, G Rigoll - Proceedings of the IEEE …, 2021‏ - openaccess.thecvf.com
Successful active speaker detection requires a three-stage pipeline:(i) audio-visual
encoding for all speakers in the clip,(ii) inter-speaker relation modeling between a reference …

A Survey of Human-Object Interaction Detection With Deep Learning

G Han, J Zhao, L Zhang, F Deng - IEEE Transactions on …, 2024‏ - ieeexplore.ieee.org
Human-object interaction (HOI) detection has attracted significant attention due to its wide
applications, including human-robot interactions, security monitoring, automatic sports …

Multi-view hand-hygiene recognition for food safety

C Zhong, AR Reibman, HA Mina, AJ Deering - Journal of Imaging, 2020‏ - mdpi.com
A majority of foodborne illnesses result from inappropriate food handling practices. One
proven practice to reduce pathogens is to perform effective hand-hygiene before all stages …

Scene separation & data selection: Temporal segmentation algorithm for real-time video stream analysis

Y **n, Z Zhou, Y **a - arxiv preprint arxiv:2308.00210, 2023‏ - arxiv.org
We present 2SDS (Scene Separation and Data Selection algorithm), a temporal
segmentation algorithm used in real-time video stream interpretation. It complements CNN …

Real-time Architecture for Audio-Visual Active Speaker Detection

M Huang, W Wang, Z Lin, FB Tesema… - … on Robotics and …, 2022‏ - ieeexplore.ieee.org
Continuously measuring the speaking state of users with robot in a human-robot Interaction
(HRI) system improves metrics of interaction quality. Meanwhile, mainstream active speaker …

Dynamic gesture recognition based on temporal shift module

Z Liu, H Li - … Conference on Artificial Intelligence and Computer …, 2023‏ - spiedigitallibrary.org
Dynamic gesture recognition is a very important interaction method in human-computer
interaction. For the current research, multi-modal data and three-dimensional convolutional …

[PDF][PDF] Towards Efficient Human Activity Recognition.

O Köpüklü - 2022‏ - researchgate.net
The amount of generated video data grows at ever-increasing rates dominating the majority
of internet traffic. Therefore, the ability to automatically analyze video data effectively and …

Video processing for safe food handling

C Zhong - 2021‏ - search.proquest.com
A majority of foodborne illnesses result from inappropriate food handling practices. One
proven practice to reduce pathogens is to perform effective hand-hygiene before all stages …