Google znalac

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org

A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …

Spremi Citiraj Spominje se 49 puta Srodni članci Svih 4 inačica Web of Science: 4 Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

G Li, J Deng, M Geng, Z **, T Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Accurate recognition of cocktail party speech containing overlap** speakers, noise and
reverberation remains a highly challenging task to date. Motivated by the invariance of …

Spremi Citiraj Spominje se 15 puta Srodni članci Svih 7 inačica Web of Science: 1 Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] sjtu.edu.cn

Unified cross-modal attention: robust audio-visual speech recognition and beyond

J Li, C Li, Y Wu, Y Qian - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

Audio-Visual Speech Recognition (AVSR) is a promising approach to improving the
accuracy and robustness of speech recognition systems with the assistance of visual cues in …

Spremi Citiraj Spominje se 5 puta Srodni članci Svih 3 inačica Web of Science: 1 Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Mx2m: masked cross-modality modeling in domain adaptation for 3d semantic segmentation

B Zhang, Z Wang, Y Ling, Y Guan, S Zhang… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Existing methods of cross-modal domain adaptation for 3D semantic segmentation predict
results only via 2D-3D complementarity that is obtained by cross-modal feature matching …

Spremi Citiraj Spominje se 6 puta Srodni članci Svih 6 inačica Find this at the Library Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scenario-aware audio-visual TF-Gridnet for target speech extraction

Z Pan, G Wichern, Y Masuyama… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Target speech extraction aims to extract, based on a given conditioning cue, a target speech
signal that is corrupted by interfering sources, such as noise or competing speakers …

Spremi Citiraj Spominje se 5 puta Srodni članci Svih 9 inačica Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ImagineNet: Target speaker extraction with intermittent visual cue through embedding inpainting

Z Pan, W Wang, M Borsdorf, H Li - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

The speaker extraction technique seeks to single out the voice of a target speaker from the
interfering voices in a speech mixture. Typically an auxiliary reference of the target speaker …

Spremi Citiraj Spominje se 9 puta Srodni članci Svih 3 inačica Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LSTMSE-Net: Long Short Term Speech Enhancement Network for Audio-visual Speech Enhancement

A Jain, JS Sanjotra, H Choudhary, K Agrawal… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we propose long short term memory speech enhancement network (LSTMSE-
Net), an audio-visual speech enhancement (AVSE) method. This innovative method …

Spremi Citiraj Spominje se 2 puta Srodni članci Svih 6 inačica Prikaži kao HTML

Efficient audio–visual information fusion using encoding pace synchronization for Audio–Visual Speech Separation

X Xu, W Tu, Y Yang - Information Fusion, 2025 - Elsevier

Contemporary audio–visual speech separation (AVSS) models typically use encoders that
merge audio and visual representations by concatenating them at a specific layer. This …

Spremi Citiraj Srodni članci Find this at the Library

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep complex u-net with conformer for audio-visual speech enhancement

S Ahmed, CW Chen, W Ren, CJ Li, E Chu… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent studies have increasingly acknowledged the advantages of incorporating visual data
into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE …

Spremi Citiraj Spominje se 2 puta Srodni članci Svih 6 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues

J Li, K Zhang, S Wang, KA Lee, H Li - arxiv preprint arxiv:2412.08247, 2024 - arxiv.org

Audio-visual Target Speaker Extraction (AV-TSE) aims to isolate the speech of a specific
target speaker from an audio mixture using time-synchronized visual cues. In real-world …

Spremi Citiraj Srodni članci Svih 2 inačica Prikaži kao HTML

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Time-domain audio-visual speech separation on low quality videos

USEV: Universal speaker extraction with visual cue

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

Unified cross-modal attention: robust audio-visual speech recognition and beyond

Mx2m: masked cross-modality modeling in domain adaptation for 3d semantic segmentation

Scenario-aware audio-visual TF-Gridnet for target speech extraction

ImagineNet: Target speaker extraction with intermittent visual cue through embedding inpainting

LSTMSE-Net: Long Short Term Speech Enhancement Network for Audio-visual Speech Enhancement

Efficient audio–visual information fusion using encoding pace synchronization for Audio–Visual Speech Separation

Deep complex u-net with conformer for audio-visual speech enhancement

MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues