Encoding of lexical tone in self-supervised models of spoken language
Interpretability research has shown that self-supervised Spoken Language Models (SLMs)
encode a wide variety of features in human speech from the acoustic, phonetic …
encode a wide variety of features in human speech from the acoustic, phonetic …
Computational Modelling of Tone Perception Based on Direct Processing of f0 Contours
It has been widely assumed that in speech perception it is imperative to first detect a set of
distinctive properties or features and then use them to recognize phonetic units like …
distinctive properties or features and then use them to recognize phonetic units like …
Decoupling recognition and transcription in mandarin asr
Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end
approach. Unlike English where the writing system is closely related to sound, Chinese …
approach. Unlike English where the writing system is closely related to sound, Chinese …
Multi-variant consistency based self-supervised learning for robust automatic speech recognition
Automatic speech recognition (ASR) has shown rapid advances in recent years but still
degrades significantly in far-field and noisy environments. The recent development of self …
degrades significantly in far-field and noisy environments. The recent development of self …
[PDF][PDF] Improved contextualized speech representations for tonal analysis
We propose fine-tuning wav2vec2. 0 with a cross-entropy loss to classify tones in an
utterance on a frame-by-frame basis. Our study demonstrates that this approach not only …
utterance on a frame-by-frame basis. Our study demonstrates that this approach not only …
A layer-wise analysis of Mandarin and English suprasegmentals in SSL speech models
A de la Fuente, D Jurafsky - arxiv preprint arxiv:2408.13678, 2024 - arxiv.org
This study asks how self-supervised speech models represent suprasegmental categories
like Mandarin lexical tone, English lexical stress, and English phrasal accents. Through a …
like Mandarin lexical tone, English lexical stress, and English phrasal accents. Through a …
Automated Tone Transcription and Clustering with Tone2Vec
Lexical tones play a crucial role in Sino-Tibetan languages. However, current phonetic
fieldwork relies on manual effort, resulting in substantial time and financial costs. This is …
fieldwork relies on manual effort, resulting in substantial time and financial costs. This is …
[PDF][PDF] Deep Prosodic Features in Tandem with Perceptual Judgments of Word Reduction for Tone Recognition in Conversed Speech
XL Lu, YF Liu - Proc. Interspeech 2024, 2024 - isca-archive.org
To tackle the tone classification problem in conversational speech, we propose a
transformer-based encoding network to classify tones in an utterance on a syllable-by …
transformer-based encoding network to classify tones in an utterance on a syllable-by …
[PDF][PDF] Low-Resource Speech Recognition for Thousands of Languages
X Li - 2023 - kilthub.cmu.edu
Recently, the performance of speech recognition has witnessed rapid improvement due to
modern architectures. Those models typically require thousands of hours of training data for …
modern architectures. Those models typically require thousands of hours of training data for …
Data Augmentation for the Post-Stroke Speech Transcription (PSST) Challenge: Sometimes Less is More
We employ the method of fine-tuning wav2vec2. 0 for recognition of phonemes in aphasic
speech. Our effort focuses on data augmentation, by supplementing data from both in …
speech. Our effort focuses on data augmentation, by supplementing data from both in …