[PDF][PDF] A review of speech-centric trustworthy machine learning: Privacy, safety, and fairness

T Feng, R Hebbar, N Mehlman, X Shi… - … on Signal and …, 2023 - nowpublishers.com
Speech-centric machine learning systems have revolutionized a number of leading
industries ranging from transportation and healthcare to education and defense …

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Voxsrc 2020: The second voxceleb speaker recognition challenge

A Nagrani, JS Chung, J Huh, A Brown, E Coto… - arxiv preprint arxiv …, 2020 - arxiv.org
We held the second installment of the VoxCeleb Speaker Recognition Challenge in
conjunction with Interspeech 2020. The goal of this challenge was to assess how well …

Voxsrc 2021: The third voxceleb speaker recognition challenge

A Brown, J Huh, JS Chung, A Nagrani… - arxiv preprint arxiv …, 2022 - arxiv.org
The third instalment of the VoxCeleb Speaker Recognition Challenge was held in
conjunction with Interspeech 2021. The aim of this challenge was to assess how well current …

The Vox Celeb Speaker Recognition Challenge: A Retrospective

J Huh, JS Chung, A Nagrani, A Brown… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and
workshops that ran annually from 2019 to 2023. The challenges primarily evaluated the …

Moviecuts: A new dataset and benchmark for cut type recognition

A Pardo, FC Heilbron, JL Alcázar, A Thabet… - … on Computer Vision, 2022 - Springer
Understanding movies and their structural patterns is a crucial task in decoding the craft of
video editing. While previous works have developed tools for general analysis, such as …

Learning to cut by watching movies

A Pardo, F Caba, JL Alcázar… - Proceedings of the …, 2021 - openaccess.thecvf.com
Video content creation keeps growing at an incredible pace; yet, creating engaging stories
remains challenging and requires non-trivial video editing expertise. Many video editing …

Coco-nut: Corpus of japanese utterance and voice characteristics description for prompt-based control

A Watanabe, S Takamichi, Y Saito… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In text-to-speech, controlling voice characteristics is important in achieving various-purpose
speech synthesis. Considering the success of text-conditioned generation, such as text-to …

Speaker verification using attentive multi-scale convolutional recurrent network

Y Li, Z Jiang, W Cao, Q Huang - Applied Soft Computing, 2022 - Elsevier
In this paper, we propose a speaker verification method by an Attentive Multi-scale
Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local …

Voxblink: A large scale speaker verification dataset on camera

Y Lin, X Qin, G Zhao, M Cheng, N Jiang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
In this paper, we introduce a large-scale and high-quality audiovisual speaker verification
dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data …