Stebėti
Yifan Yang
Pavadinimas
Cituota
Cituota
Metai
Zipformer: A faster and better encoder for automatic speech recognition
Z Yao, L Guo, X Yang, W Kang, F Kuang, Y Yang, Z Jin, L Lin, D Povey
Proc. ICLR 2024 (Oral), 2023
902023
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Z Ma, G Yang, Y Yang, Z Gao, J Wang, Z Du, F Yu, Q Chen, S Zheng, ...
Proc. AAAI 2025 (Oral), 2024
36*2024
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
W Kang, X Yang, Z Yao, F Kuang, Y Yang, L Guo, L Lin, D Povey
Proc. ICASSP 2024 (Oral), 2023
352023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Y Yang, F Shen, C Du, Z Ma, K Yu, D Povey, X Chen
Proc. ICASSP 2024 (Oral), 2023
272023
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
C Du, Y Guo, H Wang, Y Yang, Z Niu, S Wang, H Zhang, X Chen, K Yu
Proc. ICASSP 2025, 2024
222024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Z Song, J Zhuo, Y Yang, Z Ma, S Zhang, X Chen
Proc. INTERSPEECH 2024, 2024
132024
PromptASR for contextualized ASR with controllable style
X Yang, W Kang, Z Yao, Y Yang, L Guo, F Kuang, L Lin, D Povey
Proc. ICASSP 2024 (Oral), 2023
92023
Blank-regularized CTC for Frame Skipping in Neural Transducer
Y Yang, X Yang, L Guo, Z Yao, W Kang, F Kuang, L Lin, X Chen, D Povey
Proc. INTERSPEECH 2023, 2023
92023
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Y Yang, Z Song, J Zhuo, M Cui, J Li, B Yang, Y Du, Z Ma, X Liu, Z Wang, ...
arXiv preprint arXiv:2406.11546, 2024
52024
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
W Chen, Z Ma, R Yan, Y Liang, X Li, R Xu, Z Niu, Y Zhu, Y Yang, Z Liu, ...
arXiv preprint arXiv:2412.15649, 2024
42024
Delay-penalized CTC implemented based on Finite State Transducer
Z Yao, W Kang, F Kuang, L Guo, X Yang, Y Yang, L Lin, D Povey
Proc. INTERSPEECH 2023, 2023
32023
Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers
Y Yang, Z Ma, S Liu, J Li, H Wang, L Meng, H Sun, Y Liang, R Xu, Y Hu, ...
arXiv preprint arXiv:2412.16102, 2024
22024
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
Y Du, Z Ma, Y Yang, K Deng, X Chen, B Yang, Y Xiang, M Liu, B Qin
arXiv preprint arXiv:2409.19510, 2024
22024
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Z Jin*, Y Yang*, M Shi*, W Kang, X Yang, Z Yao, F Kuang, L Guo, L Meng, ...
Proc. INTERSPEECH 2024 (Oral), 2024
22024
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
M Cui, Y Yang, J Deng, J Kang, S Hu, T Wang, Z Li, S Zhang, X Chen, ...
arXiv preprint arXiv:2409.08797, 2024
12024
Exploring SSL Discrete Tokens for Multilingual ASR
M Cui, D Tan, Y Yang, D Wang, H Wang, X Chen, X Chen, X Liu
arXiv preprint arXiv:2409.08805, 2024
12024
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
H Wang, S Liu, L Meng, J Li, Y Yang, S Zhao, H Sun, Y Liu, H Sun, J Zhou, ...
arXiv preprint arXiv:2502.11128, 2025
2025
k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning
Y Yang, J Zhuo, Z Jin, Z Ma, X Yang, Z Yao, L Guo, W Kang, F Kuang, ...
arXiv preprint arXiv:2411.17100, 2024
2024
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
P Wang, Y Yang, Z Liang, T Tan, S Zhang, X Chen
Proc. INTERSPEECH 2024, 2023
2023
Sistema negali atlikti operacijos. Bandykite vėliau dar kartą.
Straipsniai 1–19