A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023‏ - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022‏ - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

[HTML][HTML] A survey of sound source localization with deep learning methods

PA Grumiaux, S Kitić, L Girin, A Guérin - The Journal of the Acoustical …, 2022‏ - pubs.aip.org
This article is a survey of deep learning methods for single and multiple sound source
localization, with a focus on sound source localization in indoor environments, where …

Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021‏ - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings

S Watanabe, M Mandel, J Barker, E Vincent… - arxiv preprint arxiv …, 2020‏ - arxiv.org
Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the
6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge …

Wavesplit: End-to-end speech separation by speaker clustering

N Zeghidour, D Grangier - IEEE/ACM Transactions on Audio …, 2021‏ - ieeexplore.ieee.org
We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the
model infers a representation for each source and then estimates each source signal given …

End-to-end neural speaker diarization with self-attention

Y Fujita, N Kanda, S Horiguchi, Y Xue… - 2019 IEEE Automatic …, 2019‏ - ieeexplore.ieee.org
Speaker diarization has been mainly developed based on the clustering of speaker
embeddings. However, the clustering-based approach has two major problems; ie,(i) it is not …

End-to-end neural speaker diarization with permutation-free objectives

Y Fujita, N Kanda, S Horiguchi, K Nagamatsu… - arxiv preprint arxiv …, 2019‏ - arxiv.org
In this paper, we propose a novel end-to-end neural-network-based speaker diarization
method. Unlike most existing methods, our proposed method does not have separate …

pyaudioanalysis: An open-source python library for audio signal analysis

T Giannakopoulos - PloS one, 2015‏ - journals.plos.org
Audio information plays a rather important role in the increasing digital content that is
available today, resulting in a need for methodologies that automatically analyze such …