- Academic Search

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org

Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

Zapisz Cytuj Cytowane przez 162 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]

[PDF] academia.edu

A survey on voice assistant security: Attacks and countermeasures

C Yan, X Ji, K Wang, Q Jiang, Z **, W Xu - ACM Computing Surveys, 2022 - dl.acm.org

Voice assistants (VA) have become prevalent on a wide range of personal devices such as
smartphones and smart speakers. As companies build voice assistants with extra …

Zapisz Cytuj Cytowane przez 64 Powiązane artykuły Wszystkie wersje 2

[Free GPT-4]

[PDF] arxiv.org

Streaming end-to-end speech recognition for mobile devices

Y He, TN Sainath, R Prabhavalkar… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …

Zapisz Cytuj Cytowane przez 772 Powiązane artykuły Wszystkie wersje 9

[Free GPT-4]

[PDF] sigport.org

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

J Shen, R Pang, RJ Weiss, M Schuster… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly
from text. The system is composed of a recurrent sequence-to-sequence feature prediction …

Zapisz Cytuj Cytowane przez 3436 Powiązane artykuły Wszystkie wersje 8

[Free GPT-4]

[PDF] isca-archive.org

Tacotron: Towards end-to-end speech synthesis

Y Wang, RJ Skerry-Ryan, D Stanton, Y Wu… - arxiv preprint arxiv …, 2017 - arxiv.org

A text-to-speech synthesis system typically consists of multiple stages, such as a text
analysis frontend, an acoustic model and an audio synthesis module. Building these …

Zapisz Cytuj Cytowane przez 2306 Powiązane artykuły Wszystkie wersje 10 Wersja HTML

[Free GPT-4]

[PDF] academia.edu

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arxiv preprint arxiv …, 2016 - academia.edu

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

Zapisz Cytuj Cytowane przez 5982 Powiązane artykuły Wszystkie wersje 10 Wersja HTML

[Free GPT-4]

[PDF] isca-archive.org

[PDF][PDF] Speaker-dependent wavenet vocoder.

A Tamamori, T Hayashi, K Kobayashi, K Takeda… - Interspeech, 2017 - isca-archive.org

In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing
speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as …

Zapisz Cytuj Cytowane przez 342 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]

[PDF] abracadoudou.com

[PDF][PDF] Tacotron: A fully end-to-end text-to-speech synthesis model

Y Wang, RJ Skerry-Ryan… - arxiv preprint …, 2017 - bengio.abracadoudou.com

ABSTRACT A text-to-speech synthesis system typically consists of multiple stages, such as a
text analysis frontend, an acoustic model and an audio synthesis module. Building these …

Zapisz Cytuj Cytowane przez 294 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] isca-archive.org

[PDF][PDF] Shallow-Fusion End-to-End Contextual Biasing.

D Zhao, TN Sainath, D Rybach, P Rondon, D Bhatia… - Interspeech, 2019 - isca-archive.org

Contextual biasing to a specific domain, including a user's song names, app names and
contact names, is an important component of any production-level automatic speech …

Zapisz Cytuj Cytowane przez 175 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Two-pass end-to-end speech recognition

TN Sainath, R Pang, D Rybach, Y He… - arxiv preprint arxiv …, 2019 - arxiv.org

The requirements for many applications of state-of-the-art speech recognition systems
include not only low word error rate (WER) but also low latency. Specifically, for many use …

Zapisz Cytuj Cytowane przez 173 Powiązane artykuły Wszystkie wersje 11 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Recent advances in Google real-time HMM-driven unit selection synthesizer

Speech technology for healthcare: Opportunities, challenges, and state of the art

A survey on voice assistant security: Attacks and countermeasures

Streaming end-to-end speech recognition for mobile devices

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

Tacotron: Towards end-to-end speech synthesis

[PDF][PDF] Wavenet: A generative model for raw audio

[PDF][PDF] Speaker-dependent wavenet vocoder.

[PDF][PDF] Tacotron: A fully end-to-end text-to-speech synthesis model

[PDF][PDF] Shallow-Fusion End-to-End Contextual Biasing.

Two-pass end-to-end speech recognition