Google Академик

A Vyas, B Shi, M Le, A Tjandra, YC Wu, B Guo… - arxiv preprint arxiv …, 2023 - arxiv.org

Audio is an essential part of our life, but creating it often requires expertise and is time-
consuming. Research communities have made great progress over the past year advancing …

Сачувај Цитирај 91 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Natural language guidance of high-fidelity text-to-speech with synthetic annotations

D Lyth, S King - arxiv preprint arxiv:2402.01912, 2024 - arxiv.org

Text-to-speech models trained on large-scale datasets have demonstrated impressive in-
context learning capabilities and naturalness. However, control of speaker identity and style …

Сачувај Цитирај 34 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

J Hwang, M Hira, C Chen, X Zhang, Z Ni… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims
to accelerate the research and development of audio and speech technologies by providing …

Сачувај Цитирај 17 пута наведен Сродни чланци Све верзије (6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spectral codecs: Spectrogram-based audio codecs for high quality speech synthesis

R Langman, A Jukić, K Dhawan, NR Koluguri… - arxiv preprint arxiv …, 2024 - arxiv.org

Historically, most speech models in machine-learning have used the mel-spectrogram as a
speech representation. Recently, discrete audio tokens produced by neural audio codecs …

Сачувај Цитирај 9 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised speech quality estimation and enhancement using only clean speech

SW Fu, KH Hung, Y Tsao, YCF Wang - arxiv preprint arxiv:2402.16321, 2024 - arxiv.org

Speech quality estimation has recently undergone a paradigm shift from human-hearing
expert designs to machine-learning models. However, current models rely mainly on …

Сачувај Цитирај 12 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speechprompt: Prompting speech language models for speech processing tasks

KW Chang, H Wu, YK Wang, YK Wu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Prompting has become a practical method for utilizing pre-trained language models (LMs).
This approach offers several advantages. It allows an LM to adapt to new tasks with minimal …

Сачувај Цитирај 2 пута наведен Сродни чланци Све верзије (7)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Low frame-rate speech codec: a codec designed for fast high-quality speech llm training and inference

E Casanova, R Langman, P Neekhara… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have significantly advanced audio processing through
audio codecs that convert audio into discrete tokens, enabling the application of language …

Сачувај Цитирај 3 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Generative speech foundation model pretraining for high-quality speech extraction and restoration

PJ Ku, AH Liu, R Korostik, SF Huang, SW Fu… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper proposes a generative pretraining foundation model for high-quality speech
restoration tasks. By directly operating on complex-valued short-time Fourier transform …

Сачувај Цитирај 2 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

J Shi, H Shim, J Tian, S Arora, H Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this work, we introduce VERSA, a unified and standardized evaluation toolkit designed for
various speech, audio, and music signals. The toolkit features a Pythonic interface with …

Сачувај Цитирај 2 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Av2wav: Diffusion-based re-synthesis from continuous self-supervised features for audio-visual speech enhancement

JC Chou, CM Chien, K Livescu - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Speech enhancement systems are typically trained using pairs of clean and noisy speech. In
audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data …

Сачувај Цитирај 4 пута наведен Сродни чланци Све верзије (5)

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Torchaudio-squim: Reference-less speech quality and intelligibility measures in torchaudio

Audiobox: Unified audio generation with natural language prompts

Natural language guidance of high-fidelity text-to-speech with synthetic annotations

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Spectral codecs: Spectrogram-based audio codecs for high quality speech synthesis

Self-supervised speech quality estimation and enhancement using only clean speech

Speechprompt: Prompting speech language models for speech processing tasks

Low frame-rate speech codec: a codec designed for fast high-quality speech llm training and inference

Generative speech foundation model pretraining for high-quality speech extraction and restoration

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Av2wav: Diffusion-based re-synthesis from continuous self-supervised features for audio-visual speech enhancement