[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
Unsupervised speech representation learning using wavenet autoencoders
We consider the task of unsupervised extraction of meaningful latent representations of
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
The diffusion of misinformation on social media: Temporal pattern, message, and source
This study examines dynamic communication processes of political misinformation on social
media focusing on three components: the temporal pattern, content mutation, and sources of …
media focusing on three components: the temporal pattern, content mutation, and sources of …
Unsupervised learning of spoken language with visual context
Humans learn to speak before they can read or write, so why can't computers do the same?
In this paper, we present a deep neural network model capable of rudimentary spoken …
In this paper, we present a deep neural network model capable of rudimentary spoken …
Jointly discovering visual objects and spoken words from raw sensory input
In this paper, we explore neural network models that learn to associate segments of spoken
audio captions with the semantically relevant portions of natural images that they refer to …
audio captions with the semantically relevant portions of natural images that they refer to …
The zero resource speech challenge 2017
We describe a new challenge aimed at discovering subword and word units from raw
speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It …
speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It …
[PDF][PDF] The zero resource speech challenge 2015.
Abstract The Interspeech 2015 Zero Resource Speech Challenge aims at discovering
subword and word units from raw speech. The challenge provides the first unified and open …
subword and word units from raw speech. The challenge provides the first unified and open …
Deep convolutional acoustic word embeddings using word-pair side information
Recent studies have been revisiting whole words as the basic modelling unit in speech
recognition and query applications, instead of phonetic units. Such whole-word segmental …
recognition and query applications, instead of phonetic units. Such whole-word segmental …
Word discovery in visually grounded, self-supervised speech models
We present a method for visually-grounded spoken term discovery. After training either a
HuBERT or wav2vec2. 0 model to associate spoken captions with natural images, we show …
HuBERT or wav2vec2. 0 model to associate spoken captions with natural images, we show …
Learning hierarchical discrete linguistic units from visually-grounded speech
In this paper, we present a method for learning discrete linguistic units by incorporating
vector quantization layers into neural models of visually grounded speech. We show that our …
vector quantization layers into neural models of visually grounded speech. We show that our …