- Academic Search

G Wijngaard, E Formisano, M Esposito… - IEEE …, 2025 - ieeexplore.ieee.org

Audio-language models (ALMs) generate linguistic descriptions of sound-producing events
and scenes. Advances in dataset creation and computational power have led to significant …

保存引用被引用次数：2 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

Leveraging audio-only data for text-queried target sound extraction

K Saijo, J Ebbers, FG Germain, S Khurana… - arxiv preprint arxiv …, 2024 - arxiv.org

The goal of text-queried target sound extraction (TSE) is to extract from a mixture a sound
source specified with a natural-language caption. While it is preferable to have access to …

保存引用被引用次数：1 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Language-Queried Target Sound Extraction Without Parallel Training Data

H Ma, Z Peng, X Li, Y Li, M Shao, Q Kong… - arxiv preprint arxiv …, 2024 - arxiv.org

Language-queried target sound extraction (TSE) aims to extract specific sounds from
mixtures based on language queries. Traditional fully-supervised training schemes require …

保存引用相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Multichannel-to-Multichannel Target Sound Extraction Using Direction and Timestamp Clues

D Choi, JW Choi - arxiv preprint arxiv:2409.12415, 2024 - arxiv.org

We propose a multichannel-to-multichannel target sound extraction (M2M-TSE) framework
for separating multichannel target signals from a multichannel mixture of sound sources …

保存引用相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model

C Hernandez-Olivan, M Delcroix, T Ochiai… - arxiv preprint arxiv …, 2024 - arxiv.org

Target sound extraction (TSE) consists of isolating a desired sound from a mixture of
arbitrary sounds using clues to identify it. A TSE system requires solving two problems at …

保存引用相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] dcase.community

[PDF][PDF] SRPOL submission to DCASE 2024 Challenge Task 9: modeling real and imaginary components, mixit and SDR based loss

M Romaniuk, J Krzywdziak - 2024 - dcase.community

We present our solution to the DCASE 2024 challenge task 9 (Language-Queried Audio
Source Separation). Our solution is based on the official baseline, with training dataset …

保存引用被引用次数：1 相关文章 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Beyond speaker identity: Text guided target speech extraction

M Huo, A Jain, CP Huynh, F Kong, P Wang… - arxiv preprint arxiv …, 2025 - arxiv.org

Target Speech Extraction (TSE) traditionally relies on explicit clues about the speaker's
identity like enrollment audio, face images, or videos, which may not always be available. In …

保存引用相关文章所有 4 个版本 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

CLAPSep: Leveraging Contrastive Pre-trained Models for Multi-Modal Query-Conditioned Target...

Audio-Language Datasets of Scenes and Events: A Survey

Leveraging audio-only data for text-queried target sound extraction

Language-Queried Target Sound Extraction Without Parallel Training Data

Multichannel-to-Multichannel Target Sound Extraction Using Direction and Timestamp Clues

SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model

[PDF][PDF] SRPOL submission to DCASE 2024 Challenge Task 9: modeling real and imaginary components, mixit and SDR based loss

Beyond speaker identity: Text guided target speech extraction