[PDF][PDF] A review of speech-centric trustworthy machine learning: Privacy, safety, and fairness
Speech-centric machine learning systems have revolutionized a number of leading
industries ranging from transportation and healthcare to education and defense …
industries ranging from transportation and healthcare to education and defense …
Overview of speaker modeling and its applications: From the lens of deep speaker representation learning
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …
By thoroughly and accurately modeling this information, it can be utilized in various …
Voxsrc 2020: The second voxceleb speaker recognition challenge
We held the second installment of the VoxCeleb Speaker Recognition Challenge in
conjunction with Interspeech 2020. The goal of this challenge was to assess how well …
conjunction with Interspeech 2020. The goal of this challenge was to assess how well …
Voxsrc 2021: The third voxceleb speaker recognition challenge
The third instalment of the VoxCeleb Speaker Recognition Challenge was held in
conjunction with Interspeech 2021. The aim of this challenge was to assess how well current …
conjunction with Interspeech 2021. The aim of this challenge was to assess how well current …
The Vox Celeb Speaker Recognition Challenge: A Retrospective
The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and
workshops that ran annually from 2019 to 2023. The challenges primarily evaluated the …
workshops that ran annually from 2019 to 2023. The challenges primarily evaluated the …
Moviecuts: A new dataset and benchmark for cut type recognition
Understanding movies and their structural patterns is a crucial task in decoding the craft of
video editing. While previous works have developed tools for general analysis, such as …
video editing. While previous works have developed tools for general analysis, such as …
Learning to cut by watching movies
Video content creation keeps growing at an incredible pace; yet, creating engaging stories
remains challenging and requires non-trivial video editing expertise. Many video editing …
remains challenging and requires non-trivial video editing expertise. Many video editing …
Coco-nut: Corpus of japanese utterance and voice characteristics description for prompt-based control
In text-to-speech, controlling voice characteristics is important in achieving various-purpose
speech synthesis. Considering the success of text-conditioned generation, such as text-to …
speech synthesis. Considering the success of text-conditioned generation, such as text-to …
Speaker verification using attentive multi-scale convolutional recurrent network
Y Li, Z Jiang, W Cao, Q Huang - Applied Soft Computing, 2022 - Elsevier
In this paper, we propose a speaker verification method by an Attentive Multi-scale
Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local …
Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local …
Voxblink: A large scale speaker verification dataset on camera
In this paper, we introduce a large-scale and high-quality audiovisual speaker verification
dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data …
dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data …