Empowering whisper as a joint multi-talker and target-talker speech recognition system
Multi-talker speech recognition and target-talker speech recognition, both involve
transcription in multi-talker contexts, remain significant challenges. However, existing …
transcription in multi-talker contexts, remain significant challenges. However, existing …
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Benefiting from massive and diverse data sources, speech foundation models exhibit strong
generalization and knowledge transfer capabilities to a wide range of downstream tasks …
generalization and knowledge transfer capabilities to a wide range of downstream tasks …
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Recent advancements in large language models (LLMs) have revolutionized various
domains, bringing significant progress and new opportunities. Despite progress in speech …
domains, bringing significant progress and new opportunities. Despite progress in speech …
Target speaker ASR with Whisper
We propose a novel approach to enable the use of large, single speaker ASR models, such
as Whisper, for target speaker ASR. The key insight of this method is that it is much easier to …
as Whisper, for target speaker ASR. The key insight of this method is that it is much easier to …
Keyword Guided Target Speech Recognition
This letter presents a new target speech recognition problem, where the target speech is
defined by a keyword. For instance, when a person speaks “Hey Google” or “Help Me”, we …
defined by a keyword. For instance, when a person speaks “Hey Google” or “Help Me”, we …
Alignment-Free Training for Transducer-based Multi-Talker ASR
Extending the RNN Transducer (RNNT) to recognize multi-talker speech is essential for
wider automatic speech recognition (ASR) applications. Multi-talker RNNT (MT-RNNT) aims …
wider automatic speech recognition (ASR) applications. Multi-talker RNNT (MT-RNNT) aims …
Extending Whisper with prompt tuning to target-speaker ASR
Target-speaker automatic speech recognition (ASR) aims to transcribe the desired speech
of a target speaker from multi-talker overlapped utterances. Most of the existing target …
of a target speaker from multi-talker overlapped utterances. Most of the existing target …
Investigation of Speaker Representation for Target-Speaker Speech Processing
Target-speaker speech processing (TS) tasks, such as target-speaker automatic speech
recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection …
recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection …
Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens
In many real-world scenarios, such as meetings, multiple speakers are present with an
unknown number of participants, and their utterances often overlap. We address these multi …
unknown number of participants, and their utterances often overlap. We address these multi …
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
In recent years, Transformer networks have shown remarkable performance in speech
recognition tasks. However, their deployment poses challenges due to high computational …
recognition tasks. However, their deployment poses challenges due to high computational …