Google Академик

Y Yang, F Shen, C Du, Z Ma, K Yu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into
utilizing discrete tokens for speech tasks like recognition and translation, which offer lower …

Сачувај Цитирај 27 пута наведен Сродни чланци Све верзије (10)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CTC variations through new WFST topologies

A Laptev, S Majumdar, B Ginsburg - ar** in neural transducer

Y Yang, X Yang, L Guo, Z Yao, W Kang… - arxiv preprint arxiv …, 2023 - arxiv.org

Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end
automatic speech recognition systems. Due to their frame-synchronous design, blank …

Сачувај Цитирај 9 пута наведен Сродни чланци Све верзије (6) HTML верзија

Unsupervised Domain Adaptation on End-to-End Multi-talker Overlapped Speech Recognition

L Zheng, H Zhu, S Tian, Q Zhao… - IEEE Signal Processing …, 2024 - ieeexplore.ieee.org

Serialized Output Training (SOT) has emerged as the mainstream approach for addressing
the multi-talker overlapped speech recognition challenge due to its simplicity. However, SOT …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

M Cui, Y Yang, J Deng, J Kang, S Hu, T Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Self-supervised learning (SSL) based discrete speech representations are highly compact
and domain adaptable. In this paper, SSL discrete speech features extracted from WavLM …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] bruguier.com

Efficient Cascaded Streaming ASR System via Frame Rate Reduction

X Cai, D Qiu, S Ding, D Hwang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

In this paper, we explore various frame rate reduction schemes on the two-pass cascaded
encoder model to improve its efficiency without scarifying the transcription quality. We …

Сачувај Цитирај 2 пута наведен Сродни чланци Све верзије (2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

Y Guo, C Wang, Y Yang, H Wang, Z Ma… - 2024 IEEE 14th …, 2024 - ieeexplore.ieee.org

Discrete speech tokens have been more and more popular in multiple speech processing
fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer

V Bataev, S Ghosh, V Lavrukhin, J Li - arxiv preprint arxiv:2501.06320, 2025 - arxiv.org

This work introduces TTS-Transducer-a novel architecture for text-to-speech, leveraging the
strengths of audio codec models and neural transducers. Transducers, renowned for their …

Сачувај Цитирај Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] The Vicomtech Speech Transcription Systems for the Albayzin 2024 Bilingual Basque-Spanish Speech to Text (BBS-S2T) Challenge

JC Vásquez-Correa, A Alvarez, H Arzelus… - Proceedings of …, 2024 - isca-archive.org

This paper presents the Vicomtech's submission to the Albayzın 2024 Bilingual Basque-
Spanish Speech-to-Text Challenge, which involves evaluating automatic speech …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Powerful and Extensible WFST Framework for Rnn-Transducer Losses

A Laptev, V Bataev, I Gitman… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

This paper presents a framework based on Weighted Finite-State Transducers (WFST) to
simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (4)

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Speech recognition with next-generation kaldi (k2, lhotse, icefall)

Towards universal speech discrete tokens: A case study for asr and tts

CTC variations through new WFST topologies

Unsupervised Domain Adaptation on End-to-End Multi-talker Overlapped Speech Recognition

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

Efficient Cascaded Streaming ASR System via Frame Rate Reduction

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer

[PDF][PDF] The Vicomtech Speech Transcription Systems for the Albayzin 2024 Bilingual Basque-Spanish Speech to Text (BBS-S2T) Challenge

Powerful and Extensible WFST Framework for Rnn-Transducer Losses