Академия Google

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Сохранить Цитировать Цитируется: 409 Похожие статьи Все версии статьи (10)

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Сохранить Цитировать Цитируется: 419 Похожие статьи Все версии статьи (9)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Сохранить Цитировать Цитируется: 1862 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hubert: Self-supervised speech representation learning by masked prediction of hidden units

WN Hsu, B Bolte, YHH Tsai, K Lakhotia… - … ACM transactions on …, 2021 - ieeexplore.ieee.org

Self-supervised approaches for speech representation learning are challenged by three
unique problems:(1) there are multiple sound units in each input utterance,(2) there is no …

Сохранить Цитировать Цитируется: 3055 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

On generative spoken language modeling from raw audio

K Lakhotia, E Kharitonov, WN Hsu, Y Adi… - Transactions of the …, 2021 - direct.mit.edu

Abstract We introduce Generative Spoken Language Modeling, the task of learning the
acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and …

Сохранить Цитировать Цитируется: 358 Похожие статьи Все версии статьи (11)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dynamical variational autoencoders: A comprehensive review

L Girin, S Leglaive, X Bie, J Diard, T Hueber… - arxiv preprint arxiv …, 2020 - arxiv.org

Variational autoencoders (VAEs) are powerful deep generative models widely used to
represent high-dimensional complex data through a low-dimensional latent space learned …

Сохранить Цитировать Цитируется: 303 Похожие статьи Все версии статьи (48) Поиск в библиотеках В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An unsupervised autoregressive model for speech representation learning

YA Chung, WN Hsu, H Tang, J Glass - arxiv preprint arxiv:1904.03240, 2019 - arxiv.org

This paper proposes a novel unsupervised autoregressive neural model for learning generic
speech representations. In contrast to other speech representation learning methods that …

Сохранить Цитировать Цитируется: 472 Похожие статьи Все версии статьи (13) В виде HTML

HuBERT: How much can a bad teacher benefit ASR pre-training?

WN Hsu, YHH Tsai, B Bolte… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Compared to vision and language applications, self-supervised pre-training approaches for
ASR are challenged by three unique problems:(1) There are multiple sound units in each …

Сохранить Цитировать Цитируется: 186 Похожие статьи Все версии статьи (2)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Unsupervised learning of disentangled and interpretable representations from sequential data

WN Hsu, Y Zhang, J Glass - Advances in neural information …, 2017 - proceedings.neurips.cc

We present a factorized hierarchical variational autoencoder, which learns disentangled and
interpretable representations from sequential data without supervision. Specifically, we …

Сохранить Цитировать Цитируется: 435 Похожие статьи Все версии статьи (9) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning latent representations for style control and transfer in end-to-end speech synthesis

YJ Zhang, S Pan, L He, ZH Ling - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

In this paper, we introduce the Variational Autoencoder (VAE) to an end-to-end speech
synthesis model, to learn the latent representation of speaking styles in an unsupervised …

Сохранить Цитировать Цитируется: 309 Похожие статьи Все версии статьи (4)

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Learning latent representations for speech generation and transformation

Self-supervised speech representation learning: A review

An overview of voice conversion and its challenges: From statistical modeling to deep learning

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

Hubert: Self-supervised speech representation learning by masked prediction of hidden units

On generative spoken language modeling from raw audio

Dynamical variational autoencoders: A comprehensive review

An unsupervised autoregressive model for speech representation learning

HuBERT: How much can a bad teacher benefit ASR pre-training?

Unsupervised learning of disentangled and interpretable representations from sequential data

Learning latent representations for style control and transfer in end-to-end speech synthesis