Google Наука

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org

Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

Запазване Позоваване С позовавания в 441 Сродни статии Всички 4 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Запазване Позоваване С позовавания в 306 Сродни статии Всички 6 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Separate anything you describe

X Liu, Q Kong, Y Zhao, H Liu, Y Yuan… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Language-queried audio source separation (LASS) is a new paradigm for computational
auditory scene analysis (CASA). LASS aims to separate a target sound from an audio …

Запазване Позоваване С позовавания в 42 Сродни статии Всички 8 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spex+: A complete time domain speaker extraction network

M Ge, C Xu, L Wang, ES Chng, J Dang, H Li - arxiv preprint arxiv …, 2020 - arxiv.org

Speaker extraction aims to extract the target speech signal from a multi-talker environment
given a target speaker's reference speech. We recently proposed a time-domain solution …

Запазване Позоваване С позовавания в 170 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-modal multi-channel target speech separation

R Gu, SX Zhang, Y Xu, L Chen… - IEEE Journal of …, 2020 - ieeexplore.ieee.org

Target speech separation refers to extracting a target speaker's voice from an overlapped
audio of simultaneous talkers. Previously the use of visual modality for target speech …

Запазване Позоваване С позовавания в 117 Сродни статии Всички 6 версии

Fusion of tactile and visual information in deep learning models for object recognition

RP Babadian, K Faez, M Amiri, E Falotico - Information Fusion, 2023 - Elsevier

Humans use multimodal sensory information to understand the physical properties of their
environment. Intelligent decision-making systems such as the ones used in robotic …

Запазване Позоваване С позовавания в 30 Сродни статии Всички 5 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audio-visual recognition of overlapped speech for the lrs2 dataset

J Yu, SX Zhang, J Wu, S Ghorbani, B Wu… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Automatic recognition of overlapped speech remains a highly challenging task to date.
Motivated by the bimodal nature of human speech perception, this paper investigates the …

Запазване Позоваване С позовавания в 107 Сродни статии Всички 7 версии

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

USEV: Universal speaker extraction with visual cue

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org

A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …

Запазване Позоваване С позовавания в 49 Сродни статии Всички 4 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Advances in online audio-visual meeting transcription

T Yoshioka, I Abramovski, C Aksoylar… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

This paper describes a system that generates speaker-annotated transcripts of meetings by
using a microphone array and a 360-degree camera. The hallmark of the system is its ability …

Запазване Позоваване С позовавания в 91 Сродни статии Всички 6 версии

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

Запазване Позоваване С позовавания в 17 Сродни статии Всички 5 версии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Time domain audio visual speech separation

Multimodal intelligence: Representation learning, information fusion, and applications

An overview of deep-learning-based audio-visual speech enhancement and separation

Separate anything you describe

Spex+: A complete time domain speaker extraction network

Multi-modal multi-channel target speech separation

Fusion of tactile and visual information in deep learning models for object recognition

Audio-visual recognition of overlapped speech for the lrs2 dataset

USEV: Universal speaker extraction with visual cue

Advances in online audio-visual meeting transcription

NeuroHeed: Neuro-steered speaker extraction using EEG signals