Google Tudós

L Meng, J Kang, Y Wang, Z **, X Wu, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Multi-talker speech recognition and target-talker speech recognition, both involve
transcription in multi-talker contexts, remain significant challenges. However, existing …

Mentés Hivatkozás Idézetek száma: 6 Kapcsolódó cikkek Mind a(z) 7 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR

P Guo, X Chang, H Lv, S Watanabe… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Benefiting from massive and diverse data sources, speech foundation models exhibit strong
generalization and knowledge transfer capabilities to a wide range of downstream tasks …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]

[PDF] arxiv.org

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

L Meng, S Hu, J Kang, Z Li, Y Wang, W Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in large language models (LLMs) have revolutionized various
domains, bringing significant progress and new opportunities. Despite progress in speech …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Target speaker ASR with Whisper

A Polok, D Klement, M Wiesner, S Khudanpur… - arxiv preprint arxiv …, 2024 - arxiv.org

We propose a novel approach to enable the use of large, single speaker ASR models, such
as Whisper, for target speaker ASR. The key insight of this method is that it is much easier to …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

Keyword Guided Target Speech Recognition

Y Shi, L Li, D Wang, J Han - IEEE Signal Processing Letters, 2024 - ieeexplore.ieee.org

This letter presents a new target speech recognition problem, where the target speech is
defined by a keyword. For instance, when a person speaks “Hey Google” or “Help Me”, we …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 2 változat

[Free GPT-4]

[PDF] arxiv.org

Alignment-Free Training for Transducer-based Multi-Talker ASR

T Moriya, S Horiguchi, M Delcroix, R Masumura… - arxiv preprint arxiv …, 2024 - arxiv.org

Extending the RNN Transducer (RNNT) to recognize multi-talker speech is essential for
wider automatic speech recognition (ASR) applications. Multi-talker RNNT (MT-RNNT) aims …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Extending Whisper with prompt tuning to target-speaker ASR

H Ma, Z Peng, M Shao, J Li, J Liu - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Target-speaker automatic speech recognition (ASR) aims to transcribe the desired speech
of a target speaker from multi-talker overlapped utterances. Most of the existing target …

Mentés Hivatkozás Idézetek száma: 16 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]

[PDF] arxiv.org

Investigation of Speaker Representation for Target-Speaker Speech Processing

T Ashihara, T Moriya, S Horiguchi… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Target-speaker speech processing (TS) tasks, such as target-speaker automatic speech
recognition (TS-ASR), target speech extraction (TSE), and personal voice activity detection …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]

[PDF] arxiv.org

Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens

Y Kashiwagi, H Futami, E Tsunoo, S Arora… - arxiv preprint arxiv …, 2024 - arxiv.org

In many real-world scenarios, such as meetings, multiple speakers are present with an
unknown number of participants, and their utterances often overlap. We address these multi …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

J Wang, Z Liang, X Zhang, N Cheng, J **ao - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, Transformer networks have shown remarkable performance in speech
recognition tasks. However, their deployment poses challenges due to high computational …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Conformer-based target-speaker automatic speech recognition for single-channel audio

Empowering whisper as a joint multi-talker and target-talker speech recognition system

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

Target speaker ASR with Whisper

Keyword Guided Target Speech Recognition

Alignment-Free Training for Transducer-based Multi-Talker ASR

Extending Whisper with prompt tuning to target-speaker ASR

Investigation of Speaker Representation for Target-Speaker Speech Processing

Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization