[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Layer-wise analysis of a self-supervised speech representation model
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …
training speech representation models. The utility of these learned representations has been …
Effectiveness of self-supervised pre-training for speech recognition
We compare self-supervised representation learning algorithms which either explicitly
quantize the audio data or learn representations without quantization. We find the former to …
quantize the audio data or learn representations without quantization. We find the former to …
Effectiveness of self-supervised pre-training for asr
We compare self-supervised representation learning algorithms which either explicitly
quantize the audio data or learn representations without quantization. We find the former to …
quantize the audio data or learn representations without quantization. We find the former to …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …
improving performance and data efficiency on various speech tasks. However, these …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …
producing performance and data efficiency improvements for a variety of speech tasks …
Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation
Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length
speech segments. For zero-resource languages where labelled data is not available, one …
speech segments. For zero-resource languages where labelled data is not available, one …
Multilingual jointly trained acoustic and written word embeddings
Acoustic word embeddings (AWEs) are vector representations of spoken word segments.
AWEs can be learned jointly with embeddings of character sequences, to generate …
AWEs can be learned jointly with embeddings of character sequences, to generate …
Improved acoustic word embeddings for zero-resource languages using multilingual transfer
Acoustic word embeddings are fixed-dimensional representations of variable-length speech
segments. Such embeddings can form the basis for speech search, indexing and discovery …
segments. Such embeddings can form the basis for speech search, indexing and discovery …