Deep clustering: Discriminative embeddings for segmentation and separation

JR Hershey, Z Chen, J Le Roux… - 2016 IEEE international …, 2016 - ieeexplore.ieee.org
We address the problem of" cocktail-party" source separation in a deep learning framework
called deep clustering. Previous deep network approaches to separation have shown …

Audio-visual scene analysis with self-supervised multisensory features

A Owens, AA Efros - Proceedings of the European …, 2018 - openaccess.thecvf.com
The thud of a bouncing ball, the onset of speech as lips open--when visual and audio events
occur together, it suggests that there might be a common, underlying event that produced …

[BOOK][B] Automatic speech recognition

D Yu, L Deng - 2016 - Springer
Automatic Speech Recognition (ASR), which is aimed to enable natural human–machine
interaction, has been an intensive research area for decades. Many core technologies, such …

Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks

M Kolbæk, D Yu, ZH Tan… - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org
In this paper, we propose the utterance-level permutation invariant training (uPIT) technique.
uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker …

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

D Yu, M Kolbæk, ZH Tan… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
We propose a novel deep learning training criterion, named permutation invariant training
(PIT), for speaker independent multi-talker speech separation, commonly known as the …

Selective cortical representation of attended speaker in multi-talker speech perception

N Mesgarani, EF Chang - Nature, 2012 - nature.com
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker
background,,. How the auditory system manages to extract intelligible speech under such …

Past review, current progress, and challenges ahead on the cocktail party problem

Y Qian, C Weng, X Chang, S Wang, D Yu - Frontiers of Information …, 2018 - Springer
The cocktail party problem, ie, tracing and recognizing the speech of a specific speaker
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …

Single-channel multi-speaker separation using deep clustering

Y Isik, JL Roux, Z Chen, S Watanabe… - arxiv preprint arxiv …, 2016 - arxiv.org
Deep clustering is a recently introduced deep learning architecture that uses discriminatively
trained embeddings as the basis for clustering. It was recently applied to spectrogram …

Recent progresses in deep learning based acoustic models

D Yu, J Li - IEEE/CAA Journal of automatica sinica, 2017 - ieeexplore.ieee.org
In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …

Treefrogs as animal models for research on auditory scene analysis and the cocktail party problem

MA Bee - International Journal of Psychophysiology, 2015 - Elsevier
The perceptual analysis of acoustic scenes involves binding together sounds from the same
source and separating them from other sounds in the environment. In large social groups …