Qwen2-audio technical report Y Chu, J Xu, Q Yang, H Wei, X Wei, Z Guo, Y Leng, Y Lv, J He, J Lin, ... arXiv preprint arXiv:2407.10759, 2024 | 76 | 2024 |
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension Q Yang, J Xu, W Liu, Y Chu, Z Jiang, X Zhou, Y Leng, Y Lv, Z Zhao, ... arXiv preprint arXiv:2402.07729, 2024 | 31 | 2024 |
SELM: Speech enhancement using discrete tokens and language models Z Wang, X Zhu, Z Zhang, YJ Lv, N Jiang, G Zhao, L Xie ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 19 | 2024 |
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation H Li, L Xue, H Guo, X Zhu, Y Lv, L Xie, Y Chen, H Yin, Z Li arXiv preprint arXiv:2406.07422, 2024 | 18 | 2024 |
Vec-tok speech: Speech vectorization and tokenization for neural speech generation X Zhu, Y Lv, Y Lei, T Li, W He, H Zhou, H Lu, L Xie arXiv preprint arXiv:2310.07246, 2023 | 11 | 2023 |
SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation Y Lv, J Yao, P Chen, H Zhou, H Lu, L Xie 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-8, 2023 | 6 | 2023 |
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter Y Lv, H Li, Y Yan, J Liu, D Xie, L Xie arXiv preprint arXiv:2406.08196, 2024 | 3 | 2024 |
RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement M Liu, Z Chen, X Yan, Y Lv, X Xia, C Huang, Y Xiao, L Xie arXiv preprint arXiv:2401.04389, 2024 | 3 | 2024 |
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy L Ma, X Zhu, Y Lv, Z Wang, Z Wang, W He, H Zhou, L Xie arXiv preprint arXiv:2406.09844, 2024 | 2 | 2024 |
HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS D Guo, X Zhu, L Xue, T Li, Y Lv, Y Jiang, L Xie 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-7, 2023 | 2 | 2023 |
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models W Liu, Z Guo, J Xu, Y Lv, Y Chu, Z Zhao, J Lin arXiv preprint arXiv:2409.19283, 2024 | 1 | 2024 |
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention M Liu, Z Chen, X Yan, Y Lv, X Xia, C Huang, Y Xiao, L Xie arXiv preprint arXiv:2406.07498, 2024 | | 2024 |