Robust wav2vec 2.0: Analyzing domain shift in self-supervised pre-training
Self-supervised learning of speech representations has been a very active research area
but most work is focused on a single domain such as read audio books for which there exist …
but most work is focused on a single domain such as read audio books for which there exist …
Textually pretrained speech language models
Speech language models (SpeechLMs) process and generate acoustic data only, without
textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using …
textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using …
Audiobox: Unified audio generation with natural language prompts
Audio is an essential part of our life, but creating it often requires expertise and is time-
consuming. Research communities have made great progress over the past year advancing …
consuming. Research communities have made great progress over the past year advancing …
Titanet: Neural model for speaker representation with 1d depth-wise separable convolutions and global context
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …
representations. We employ 1D depth-wise separable convolutions with Squeeze-and …
Rethinking evaluation in asr: Are our models robust enough?
Is pushing numbers on a single benchmark valuable in automatic speech recognition?
Research results in acoustic modeling are typically evaluated based on performance on a …
Research results in acoustic modeling are typically evaluated based on performance on a …
Word informativity influences acoustic duration: Effects of contextual predictability on lexical representation
S Seyfarth - Cognition, 2014 - Elsevier
Abstract Language-users reduce words in predictable contexts. Previous research indicates
that reduction may be stored in lexical representation if a word is often reduced. Because …
that reduction may be stored in lexical representation if a word is often reduced. Because …
A Review of Natural-Language-Instructed Robot Execution Systems
It is natural and efficient to use human natural language (NL) directly to instruct robot task
executions without prior user knowledge of instruction patterns. Currently, NL-instructed …
executions without prior user knowledge of instruction patterns. Currently, NL-instructed …
slimipl: Language-model-free iterative pseudo-labeling
Recent results in end-to-end automatic speech recognition have demonstrated the efficacy
of pseudo-labeling for semi-supervised models trained both with Connectionist Temporal …
of pseudo-labeling for semi-supervised models trained both with Connectionist Temporal …
Performance-efficiency trade-offs in unsupervised pre-training for speech recognition
This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic
speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture …
speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture …
Individual differences in artificial and natural language statistical learning
Statistical learning (SL) is considered a cornerstone of cognition. While decades of research
have unveiled the remarkable breadth of structures that participants can learn from statistical …
have unveiled the remarkable breadth of structures that participants can learn from statistical …