Empowering whisper as a joint multi-talker and target-talker speech recognition system

L Meng, J Kang, Y Wang, Z **, X Wu, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-talker speech recognition and target-talker speech recognition, both involve
transcription in multi-talker contexts, remain significant challenges. However, existing …

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR

P Guo, X Chang, H Lv, S Watanabe… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Benefiting from massive and diverse data sources, speech foundation models exhibit strong
generalization and knowledge transfer capabilities to a wide range of downstream tasks …

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

L Meng, S Hu, J Kang, Z Li, Y Wang, W Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have revolutionized various
domains, bringing significant progress and new opportunities. Despite progress in speech …

Target speaker ASR with Whisper

A Polok, D Klement, M Wiesner, S Khudanpur… - arxiv preprint arxiv …, 2024 - arxiv.org
We propose a novel approach to enable the use of large, single speaker ASR models, such
as Whisper, for target speaker ASR. The key insight of this method is that it is much easier to …

Keyword Guided Target Speech Recognition

Y Shi, L Li, D Wang, J Han - IEEE Signal Processing Letters, 2024 - ieeexplore.ieee.org
This letter presents a new target speech recognition problem, where the target speech is
defined by a keyword. For instance, when a person speaks “Hey Google” or “Help Me”, we …

Alignment-Free Training for Transducer-based Multi-Talker ASR

T Moriya, S Horiguchi, M Delcroix, R Masumura… - arxiv preprint arxiv …, 2024 - arxiv.org
Extending the RNN Transducer (RNNT) to recognize multi-talker speech is essential for
wider automatic speech recognition (ASR) applications. Multi-talker RNNT (MT-RNNT) aims …

Extending Whisper with prompt tuning to target-speaker ASR

H Ma, Z Peng, M Shao, J Li, J Liu - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Target-speaker automatic speech recognition (ASR) aims to transcribe the desired speech
of a target speaker from multi-talker overlapped utterances. Most of the existing target …

Investigation of Speaker Representation for Target-Speaker Speech Processing

T Ashihara, T Moriya, S Horiguchi… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Target-speaker speech processing (TS) tasks, such as target-speaker automatic speech
recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection …

Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens

Y Kashiwagi, H Futami, E Tsunoo, S Arora… - arxiv preprint arxiv …, 2024 - arxiv.org
In many real-world scenarios, such as meetings, multiple speakers are present with an
unknown number of participants, and their utterances often overlap. We address these multi …

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

J Wang, Z Liang, X Zhang, N Cheng, J **ao - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, Transformer networks have shown remarkable performance in speech
recognition tasks. However, their deployment poses challenges due to high computational …