A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

A survey of speaker recognition: Fundamental theories, recognition methods and opportunities

MM Kabir, MF Mridha, J Shin, I Jahan, AQ Ohi - Ieee Access, 2021 - ieeexplore.ieee.org
Humans can identify a speaker by listening to their voice, over the telephone, or on any
digital devices. Acquiring this congenital human competency, authentication technologies …

Speaker diarization: A review of recent research

X Anguera, S Bozonnet, N Evans… - … on audio, speech …, 2012 - ieeexplore.ieee.org
Speaker diarization is the task of determining “who spoke when?” in an audio or video
recording that contains an unknown amount of speech and also an unknown number of …

Deep learning for video classification and captioning

Z Wu, T Yao, Y Fu, YG Jiang - Frontiers of multimedia research, 2017 - dl.acm.org
Today's digital contents are inherently multimedia: text, audio, image, video, and so on.
Video, in particular, has become a new way of communication between Internet users with …

Speech recognition model construction method, speech recognition method, computer system, speech recognition apparatus, program, and recording medium

G Kurata, T Nagano, M Suzuki, R Tachibana - US Patent 9,812,122, 2017 - Google Patents
(57) ABSTRACT A construction method for a speech recognition model, in which a computer
system includes; a step of acquiring alignment between speech of each of a plurality of …