Google 학술 검색

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org

Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

저장 인용 85회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]

[PDF] acm.org

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

저장 인용 194회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]

[PDF] arxiv.org

Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario

I Medennikov, M Korenevsky, T Prisyach… - ar** speech in a diarization system.
First, we detail a neural Long Short-Term Memory-based architecture for overlap detection …

저장 인용 125회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] arxiv.org

Voice activity detection in the wild: A data-driven approach using teacher-student training

H Dinkel, S Wang, X Xu, M Wu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org

Voice activity detection is an essential pre-processing component for speech-related tasks
such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain …

저장 인용 55회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]

[PDF] arxiv.org

Personalized percepnet: Real-time, low-complexity target voice separation and enhancement

R Giri, S Venkataramani, JM Valin, U Isik… - arxiv preprint arxiv …, 2021 - arxiv.org

The presence of multiple talkers in the surrounding environment poses a difficult challenge
for real-time speech communication systems considering the constraints on network size …

저장 인용 43회 인용 관련 학술자료 전체 12개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Marblenet: Deep 1d time-channel separable convolutional neural network for voice activity detection

F Jia, S Majumdar, B Ginsburg - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

We present MarbleNet, an end-to-end neural network for Voice Activity Detection (VAD).
MarbleNet is a deep residual network composed from blocks of 1D time-channel separable …

저장 인용 64회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] arxiv.org

End-to-end active speaker detection

JL Alcázar, M Cordes, C Zhao, B Ghanem - European Conference on …, 2022 - Springer

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage
process: feature extraction and spatio-temporal context aggregation. In this paper, we …

저장 인용 33회 인용 관련 학술자료 전체 7개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Personal VAD: Speaker-conditioned voice activity detection

Neural target speech extraction: An overview

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario

Voice activity detection in the wild: A data-driven approach using teacher-student training

Personalized percepnet: Real-time, low-complexity target voice separation and enhancement

Marblenet: Deep 1d time-channel separable convolutional neural network for voice activity detection

End-to-end active speaker detection