Fixation prediction through multimodal analysis

X Min, G Zhai, K Gu, X Yang - ACM Transactions on Multimedia …, 2016 - dl.acm.org
In this article, we propose to predict human eye fixation through incorporating both audio
and visual cues. Traditional visual attention models generally make the utmost of stimuli's …

Fusion of magnetic and visual sensors for indoor localization: Infrastructure-free and more effective

Z Liu, L Zhang, Q Liu, Y Yin, L Cheng… - IEEE Transactions …, 2016 - ieeexplore.ieee.org
Accurate and infrastructure-free indoor positioning can be very useful in a variety of
applications. However, most existing approaches (eg, WiFi and infrared-based methods) for …

Look&listen: Multi-modal correlation learning for active speaker detection and speech enhancement

J **ong, Y Zhou, P Zhang, L **e… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Active speaker detection and speech enhancement have become two increasingly attractive
topics in audio-visual scenario understanding. According to their respective characteristics …

Auxiliary classifier generative adversarial network with soft labels in imbalanced acoustic event detection

X **a, R Togneri, F Sohel… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
In acoustic event detection, the training data size of some acoustic events is often small and
imbalanced. To deal with this, this paper proposes generating the virtual training data …

Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN

S Nainan, V Kulkarni - International Journal of Speech Technology, 2021 - Springer
Contemporary automatic speaker recognition (ASR) systems do not provide 100% accuracy
making it imperative to explore different techniques to improve it. Easy access to mobile …

Introduction of SVM algorithms and recent applications about fault diagnosis and other aspects

Z Yin, J Liu, M Krueger, H Gao - 2015 IEEE 13th International …, 2015 - ieeexplore.ieee.org
Support vector machine has obtained more and more attentions as a new method of
machine learning based on the statistic learning theory. At the same time, there are …

Multimodal multi-channel on-line speaker diarization using sensor fusion through SVM

VP Minotto, CR Jung, B Lee - IEEE Transactions on Multimedia, 2015 - ieeexplore.ieee.org
Speaker diarization (SD) is the process of assigning speech segments of an audio stream to
its corresponding speakers, thus comprising the problem of voice activity detection (VAD) …

Sound source localization in wide-range outdoor environment using distributed sensor network

MM Faraji, SB Shouraki, E Iranmehr… - IEEE Sensors …, 2019 - ieeexplore.ieee.org
Sound source localization has always been one of the most challenging subjects in different
fields of engineering, one of the most important of which being tracking of flying objects. This …

Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges

V Mingote, A Ortega, A Miguel, E Lleida - arxiv preprint arxiv:2409.05659, 2024 - arxiv.org
Nowadays, the large amount of audio-visual content available has fostered the need to
develop new robust automatic speaker diarization systems to analyse and characterise it …

Multimodal fusion refiner networks

S Sankaran, D Yang, SN Lim - arxiv preprint arxiv:2104.03435, 2021 - arxiv.org
Tasks that rely on multi-modal information typically include a fusion module that combines
information from different modalities. In this work, we develop a Refiner Fusion Network …