Encoding of lexical tone in self-supervised models of spoken language

G Shen, M Watkins, A Alishahi, A Bisazza - arxiv preprint arxiv …, 2024 - arxiv.org
Interpretability research has shown that self-supervised Spoken Language Models (SLMs)
encode a wide variety of features in human speech from the acoustic, phonetic …

Computational Modelling of Tone Perception Based on Direct Processing of f0 Contours

Y Chen, Y Gao, Y Xu - Brain Sciences, 2022 - mdpi.com
It has been widely assumed that in speech perception it is imperative to first detect a set of
distinctive properties or features and then use them to recognize phonetic units like …

Decoupling recognition and transcription in mandarin asr

J Yuan, X Cai, D Gao, R Zheng… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end
approach. Unlike English where the writing system is closely related to sound, Chinese …

Multi-variant consistency based self-supervised learning for robust automatic speech recognition

C Gao, G Cheng, P Zhang - arxiv preprint arxiv:2112.12522, 2021 - arxiv.org
Automatic speech recognition (ASR) has shown rapid advances in recent years but still
degrades significantly in far-field and noisy environments. The recent development of self …

[PDF][PDF] Improved contextualized speech representations for tonal analysis

J Yuan, X Cai, K Church - Proceedings of Interspeech, 2023 - isca-archive.org
We propose fine-tuning wav2vec2. 0 with a cross-entropy loss to classify tones in an
utterance on a frame-by-frame basis. Our study demonstrates that this approach not only …

A layer-wise analysis of Mandarin and English suprasegmentals in SSL speech models

A de la Fuente, D Jurafsky - arxiv preprint arxiv:2408.13678, 2024 - arxiv.org
This study asks how self-supervised speech models represent suprasegmental categories
like Mandarin lexical tone, English lexical stress, and English phrasal accents. Through a …

Automated Tone Transcription and Clustering with Tone2Vec

Y Yang, Y Wang, ZQ Tang, J Yuan - arxiv preprint arxiv:2410.02324, 2024 - arxiv.org
Lexical tones play a crucial role in Sino-Tibetan languages. However, current phonetic
fieldwork relies on manual effort, resulting in substantial time and financial costs. This is …

[PDF][PDF] Deep Prosodic Features in Tandem with Perceptual Judgments of Word Reduction for Tone Recognition in Conversed Speech

XL Lu, YF Liu - Proc. Interspeech 2024, 2024 - isca-archive.org
To tackle the tone classification problem in conversational speech, we propose a
transformer-based encoding network to classify tones in an utterance on a syllable-by …

[PDF][PDF] Low-Resource Speech Recognition for Thousands of Languages

X Li - 2023 - kilthub.cmu.edu
Recently, the performance of speech recognition has witnessed rapid improvement due to
modern architectures. Those models typically require thousands of hours of training data for …

Data Augmentation for the Post-Stroke Speech Transcription (PSST) Challenge: Sometimes Less is More

J Yuan, X Cai, K Church - … of the RaPID Workshop-Resources and …, 2022 - aclanthology.org
We employ the method of fine-tuning wav2vec2. 0 for recognition of phonemes in aphasic
speech. Our effort focuses on data augmentation, by supplementing data from both in …