Följ
gao zhifu
gao zhifu
Tongyi Lab, Alibaba Group
Verifierad e-postadress på alibaba-inc.com
Titel
Citeras av
Citeras av
År
Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition
Z Gao, S Zhang, I McLoughlin, Z Yan
arXiv preprint arXiv:2206.08317, 2022
1022022
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System
Z Gao, Y Song, IV McLoughlin, P Li, Y Jiang, LR Dai
INTERSPEECH 2019, 361-365, 2019
882019
emotion2vec: Self-supervised pre-training for speech emotion representation
Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang, X Chen
arXiv preprint arXiv:2312.15185, 2023
822023
Lauragpt: Listen, attend, understand, and regenerate audio with gpt
Z Du, J Wang, Q Chen, Y Chu, Z Gao, Z Li, K Hu, X Zhou, J Xu, Z Ma, ...
arXiv preprint arXiv:2310.04673, 2023
682023
Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens
Z Du, Q Chen, S Zhang, K Hu, H Lu, Y Yang, H Hu, S Zheng, Y Gu, Z Ma, ...
arXiv preprint arXiv:2407.05407, 2024
632024
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Z Gao, Z Li, J Wang, H Luo, X Shi, M Chen, Y Li, L Zuo, Z Du, Z Xiao, ...
INERSPEECH 2023, 2023
532023
San-m: Memory equipped self-attention for end-to-end speech recognition
Z Gao, S Zhang, M Lei, I McLoughlin
INTERSPEECH 2020, 6-10, 2020
362020
An Effective Deep Embedding Learning Architecture for Speaker Verification
Y Jiang, Y Song, IV McLoughlin, Z Gao, LR Dai
INTERSPEECH 2019, 4040-4044, 2019
362019
Streaming chunk-aware multihead attention for online end-to-end speech recognition
S Zhang, Z Gao, H Luo, M Lei, J Gao, Z Yan, L Xie
INTERSPEECH 2020, 2142-2146, 2020
322020
An improved deep embedding learning method for short duration speaker verification
Z Gao, Y Song, IV McLoughlin, W Guo, LR Dai
INTERSPEECH 2018, 3578-3582, 2018
322018
An embarrassingly simple approach for LLM with strong ASR capacity
Z Ma, G Yang, Y Yang, Z Gao, J Wang, Z Du, F Yu, Q Chen, S Zheng, ...
arXiv preprint arXiv:2402.08846, 2024
262024
Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms
K An, Q Chen, C Deng, Z Du, C Gao, Z Gao, Y Gu, T He, H Hu, K Hu, S Ji, ...
arXiv preprint arXiv:2407.04051, 2024
252024
Extremely Low Footprint End-to-End ASR System for Smart Device
Z Gao, Y Yao, S Zhang, J Yang, M Lei, I McLoughlin
INTERSPEECH 2021, 4548-4552, 2021
162021
Universal asr: Unifying streaming and non-streaming asr using a single encoder-decoder model
Z Gao, S Zhang, M Lei, I McLoughlin
arXiv preprint arXiv:2010.14099, 2020
162020
Seaco-paraformer: A non-autoregressive asr system with flexible and effective hotword customization ability
X Shi, Y Yang, Z Li, Y Chen, Z Gao, S Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
132024
Mala-asr: Multimedia-assisted llm-based asr
G Yang, Z Ma, F Yu, Z Gao, S Zhang, X Chen
arXiv preprint arXiv:2406.05839, 2024
82024
Cosyvoice 2: Scalable streaming speech synthesis with large language models
Z Du, Y Wang, Q Chen, X Shi, X Lv, T Zhao, Z Gao, Y Yang, C Gao, ...
arXiv preprint arXiv:2412.10117, 2024
62024
Minmo: A multimodal large language model for seamless voice interaction
Q Chen, Y Chen, Y Chen, M Chen, Y Chen, C Deng, Z Du, R Gao, C Gao, ...
arXiv preprint arXiv:2501.06282, 2025
22025
Wav2vec‐MoE: An unsupervised pre‐training and adaptation method for multi‐accent ASR
Y Lin, S Zhang, Z Gao, L Wang, Y Yang, J Dang
Electronics Letters 59 (11), e12823, 2023
22023
CTC-Assisted LLM-Based Contextual ASR
G Yang, Z Ma, Z Gao, S Zhang, X Chen
2024 IEEE Spoken Language Technology Workshop (SLT), 126-131, 2024
12024
Systemet kan inte utföra åtgärden just nu. Försök igen senare.
Artiklar 1–20