- Academic Search

Parallel waveform synthesis based on generative adversarial networks with voicing-aware condition...

Szukaj w artykułach zawierających cytaty

Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier

Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

Zapisz Cytuj Cytowane przez 64 Powiązane artykuły Wszystkie wersje 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

TTS-by-TTS: TTS-driven data augmentation for fast and high-quality speech synthesis

MJ Hwang, R Yamamoto, E Song… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

In this paper, we propose a text-to-speech (TTS)-driven data augmentation method for
improving the quality of a non-autoregressive (AR) TTS system. Recently proposed non-AR …

Zapisz Cytuj Cytowane przez 41 Powiązane artykuły Wszystkie wersje 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Language model-based emotion prediction methods for emotional speech synthesis systems

HW Yoon, O Kwon, H Lee, R Yamamoto… - arxiv preprint arxiv …, 2022 - arxiv.org

This paper proposes an effective emotional text-to-speech (TTS) system with a pre-trained
language model (LM)-based emotion prediction method. Unlike conventional systems that …

Zapisz Cytuj Cytowane przez 15 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Modeling and driving human body soundfields through acoustic primitives

C Huang, D Marković, C Xu, A Richard - European Conference on …, 2024 - Springer

While rendering and animation of photorealistic 3D human body models have matured and
reached an impressive quality over the past years, modeling the spatial audio associated …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

N-singer: A non-autoregressive korean singing voice synthesis system for pronunciation enhancement

GH Lee, TW Kim, H Bae, MJ Lee, YI Kim… - arxiv preprint arxiv …, 2021 - arxiv.org

Recently, end-to-end Korean singing voice systems have been designed to generate
realistic singing voices. However, these systems still suffer from a lack of robustness in terms …

Zapisz Cytuj Cytowane przez 20 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks

R Yoneyama, YC Wu, T Toda - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

We introduce unified source-filter generative adversarial networks (uSFGAN), a waveform
generative model conditioned on acoustic features, which represents the source-filter …

Zapisz Cytuj Cytowane przez 4 Powiązane artykuły Wszystkie wersje 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improved Parallel WaveGAN vocoder with perceptually weighted spectrogram loss

E Song, R Yamamoto, MJ Hwang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

This paper proposes a spectral-domain perceptual weighting technique for Parallel
WaveGAN-based text-to-speech (TTS) systems. The recently proposed Parallel WaveGAN …

Zapisz Cytuj Cytowane przez 24 Powiązane artykuły Wszystkie wersje 5

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Sounding bodies: modeling 3D spatial sound of humans using body pose and audio

X Xu, D Markovic, J Sandakly… - Advances in …, 2024 - proceedings.neurips.cc

While 3D human body modeling has received much attention in computer vision, modeling
the acoustic equivalent, ie modeling 3D spatial audio produced by body motion and speech …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] github.io

[PDF][PDF] High-Fidelity Parallel WaveGAN with Multi-Band Harmonic-Plus-Noise Model.

MJ Hwang, R Yamamoto, E Song, JM Kim - Interspeech, 2021 - sewplay.github.io

This paper proposes a multi-band harmonic-plus-noise (HN) Parallel WaveGAN (PWG)
vocoder. To generate a highfidelity speech signal, it is important to well-reflect the harmonic …

Zapisz Cytuj Cytowane przez 19 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis

K Futamata, B Park, R Yamamoto… - arxiv preprint arxiv …, 2021 - arxiv.org

We propose a novel phrase break prediction method that combines implicit features
extracted from a pre-trained large language model, aka BERT, and explicit features …

Zapisz Cytuj Cytowane przez 21 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Parallel waveform synthesis based on generative adversarial networks with voicing-aware condition...

Generative adversarial networks for speech processing: A review

TTS-by-TTS: TTS-driven data augmentation for fast and high-quality speech synthesis

Language model-based emotion prediction methods for emotional speech synthesis systems

Modeling and driving human body soundfields through acoustic primitives

N-singer: A non-autoregressive korean singing voice synthesis system for pronunciation enhancement

High-fidelity and pitch-controllable neural vocoder based on unified source-filter networks

Improved Parallel WaveGAN vocoder with perceptually weighted spectrogram loss

Sounding bodies: modeling 3D spatial sound of humans using body pose and audio

[PDF][PDF] High-Fidelity Parallel WaveGAN with Multi-Band Harmonic-Plus-Noise Model.

Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis