الباحث العلمي من Google

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023‏ - Elsevier‏

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …‏

حفظ اقتباس تم اقتباسها في عدد: 235 مقالات ذات صلة الإصدارات الـ 6كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Measuring disentanglement: A review of metrics‏

MA Carbonneau, J Zaidi, J Boilard… - IEEE transactions on …, 2022‏ - ieeexplore.ieee.org‏

Learning to disentangle and represent factors of variation in data is an important problem in
artificial intelligence. While many advances have been made to learn these representations …‏

حفظ اقتباس تم اقتباسها في عدد: 102 مقالات ذات صلة الإصدارات الـ 8كلها

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale‏

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024‏ - proceedings.neurips.cc‏

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …‏

حفظ اقتباس تم اقتباسها في عدد: 253 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Libritts: A corpus derived from librispeech for text-to-speech‏

H Zen, V Dang, R Clark, Y Zhang, RJ Weiss… - arxiv preprint arxiv …, 2019‏ - arxiv.org‏

This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …‏

حفظ اقتباس تم اقتباسها في عدد: 1052 مقالات ذات صلة الإصدارات الـ 10كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unsupervised speech representation learning using wavenet autoencoders‏

J Chorowski, RJ Weiss, S Bengio… - … /ACM transactions on …, 2019‏ - ieeexplore.ieee.org‏

We consider the task of unsupervised extraction of meaningful latent representations of
speech by applying autoencoding neural networks to speech waveforms. The goal is to …‏

حفظ اقتباس تم اقتباسها في عدد: 417 مقالات ذات صلة الإصدارات الـ 11كلها

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Meta-stylespeech: Multi-speaker adaptive text-to-speech generation‏

D Min, DB Lee, E Yang… - … Conference on Machine …, 2021‏ - proceedings.mlr.press‏

With rapid progress in neural text-to-speech (TTS) models, personalized speech generation
is now in high demand for many applications. For practical applicability, a TTS model should …‏

حفظ اقتباس تم اقتباسها في عدد: 178 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit‏

T Hayashi, R Yamamoto, K Inoue… - ICASSP 2020-2020 …, 2020‏ - ieeexplore.ieee.org‏

This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …‏

حفظ اقتباس تم اقتباسها في عدد: 247 مقالات ذات صلة الإصدارات الـ 7كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiobox: Unified audio generation with natural language prompts‏

A Vyas, B Shi, M Le, A Tjandra, YC Wu, B Guo… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Audio is an essential part of our life, but creating it often requires expertise and is time-
consuming. Research communities have made great progress over the past year advancing …‏

حفظ اقتباس تم اقتباسها في عدد: 91 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Lipsync3d: Data-efficient learning of personalized 3d talking faces from video using pose and lighting normalization‏

A Lahiri, V Kwatra, C Frueh, J Lewis… - Proceedings of the …, 2021‏ - openaccess.thecvf.com‏

In this paper, we present a video-based learning framework for animating personalized 3D
talking faces from audio. We introduce two training-time data normalizations that significantly …‏

حفظ اقتباس تم اقتباسها في عدد: 114 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] wiley.com

ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech‏

S Ghorbani, Y Ferstl, D Holden, NF Troje… - Computer Graphics …, 2023‏ - Wiley Online Library‏

We present ZeroEGGS, a neural network framework for speech‐driven gesture generation
with zero‐shot style control by example. This means style can be controlled via only a short …‏

حفظ اقتباس تم اقتباسها في عدد: 73 مقالات ذات صلة الإصدارات الـ 7كلها

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Hierarchical generative modeling for controllable speech synthesis

A review of deep learning techniques for speech processing‏

Measuring disentanglement: A review of metrics‏

Voicebox: Text-guided multilingual universal speech generation at scale‏

Libritts: A corpus derived from librispeech for text-to-speech‏

Unsupervised speech representation learning using wavenet autoencoders‏

Meta-stylespeech: Multi-speaker adaptive text-to-speech generation‏

ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit‏

Audiobox: Unified audio generation with natural language prompts‏

Lipsync3d: Data-efficient learning of personalized 3d talking faces from video using pose and lighting normalization‏

ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech‏