- Academic Search

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Gem Citer Citeret af 35 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arxiv preprint arxiv:2210.13438, 2022 - arxiv.org

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

Gem Citer Citeret af 711 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

Gem Citer Citeret af 248 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

Gem Citer Citeret af 71 Relaterede artikler Alle 2 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Moshi: a speech-text foundation model for real-time dialogue

A Défossez, L Mazaré, M Orsini, A Royer… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue
framework. Current systems for spoken dialogue rely on pipelines of independent …

Gem Citer Citeret af 45 Relaterede artikler Alle 5 versioner Vis som HTML

CASE-Net: Integrating local and non-local attention operations for speech enhancement

X Xu, W Tu, Y Yang - Speech Communication, 2023 - Elsevier

Local and non-local attention operations are two ubiquitous operations in the domain of
speech enhancement (SE), and they are effective to generate more discriminative patterns …

Gem Citer Citeret af 17 Relaterede artikler Alle 2 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CMGAN: Conformer-based metric GAN for speech enhancement

R Cao, S Abdulatif, B Yang - arxiv preprint arxiv:2203.15149, 2022 - arxiv.org

Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …

Gem Citer Citeret af 119 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org Full View

Icassp 2023 acoustic echo cancellation challenge

R Cutler, A Saabas, T Pärnamaa… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org

The ICASSP 2023 Acoustic Echo Cancellation Challenge is intended to stimulate research
in acoustic echo cancellation (AEC), which is an important area of speech enhancement and …

Gem Citer Citeret af 93 Relaterede artikler Alle 10 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The VoicePrivacy 2024 Challenge Evaluation Plan

N Tomashenko, X Miao, P Champion, S Meyer… - arxiv preprint arxiv …, 2024 - arxiv.org

The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …

Gem Citer Citeret af 98 Relaterede artikler Alle 23 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement

S Zhao, B Ma, KN Watcharasupat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED)
structure and a recurrent structure have achieved promising performance for monaural …

Gem Citer Citeret af 84 Relaterede artikler Alle 3 versioner

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Icassp 2023 deep noise suppression challenge

Sparks of large audio models: A survey and outlook

High fidelity neural audio compression

High-fidelity audio compression with improved rvqgan

Speechx: Neural codec language model as a versatile speech transformer

Moshi: a speech-text foundation model for real-time dialogue

CASE-Net: Integrating local and non-local attention operations for speech enhancement

CMGAN: Conformer-based metric GAN for speech enhancement

Icassp 2023 acoustic echo cancellation challenge

The VoicePrivacy 2024 Challenge Evaluation Plan

FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement