Recent advances in convolutional neural networks

J Gu, Z Wang, J Kuen, L Ma, A Shahroudy, B Shuai… - Pattern recognition, 2018 - Elsevier
In the last few years, deep learning has led to very good performance on a variety of
problems, such as visual recognition, speech recognition and natural language processing …

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arxiv preprint arxiv …, 2016 - academia.edu
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

Wavenet: A generative model for raw audio

A Oord, S Dieleman, H Zen, K Simonyan… - arxiv preprint arxiv …, 2016 - arxiv.org
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

Merlin: An open source neural network speech synthesis system

Z Wu, O Watts, S King - 9th ISCA Speech Synthesis Workshop, 2016 - research.ed.ac.uk
We introduce the Merlin speech synthesis toolkit for neural network-based speech synthesis.
The system takes linguistic features as input, and employs neural networks to predict …

Moglow: Probabilistic and controllable motion synthesis using normalising flows

GE Henter, S Alexanderson, J Beskow - ACM Transactions on Graphics …, 2020 - dl.acm.org
Data-driven modelling and synthesis of motion is an active research area with applications
that include animation, games, and social robotics. This paper introduces a new class of …

Masked autoregressive flow for density estimation

G Papamakarios, T Pavlakou… - Advances in neural …, 2017 - proceedings.neurips.cc
Autoregressive models are among the best performing neural density estimators. We
describe an approach for increasing the flexibility of an autoregressive model, based on …

Made: Masked autoencoder for distribution estimation

M Germain, K Gregor, I Murray… - … on machine learning, 2015 - proceedings.mlr.press
There has been a lot of recent interest in designing neural network models to estimate a
distribution from a set of examples. We introduce a simple modification for autoencoder …

[PDF][PDF] Acoustic modeling in statistical parametric speech synthesis-from HMM to LSTM-RNN

H Zen - Proc. MLSLP, 2015 - research.google.com
Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder
to render speech given a text. Typically decision tree-clustered context-dependent hidden …

Investigating gated recurrent networks for speech synthesis

Z Wu, S King - … Conference on Acoustics, Speech and Signal …, 2016 - ieeexplore.ieee.org
Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged
as a potential acoustic model for statistical parametric speech synthesis (SPSS). The long …

Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices

H Zen, Y Agiomyrgiannakis, N Egberts… - arxiv preprint arxiv …, 2016 - arxiv.org
Acoustic models based on long short-term memory recurrent neural networks (LSTM-RNNs)
were applied to statistical parametric speech synthesis (SPSS) and showed significant …