Salmonn: Towards generic hearing abilities for large language models C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
arXiv preprint arXiv:2310.13289, 2023
217 2023 Connecting speech encoder and large language model for asr W Yu, C Tang, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
45 2024 video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, Y Wang, C Zhang
arXiv preprint arXiv:2406.15704, 2024
23 * 2024 Extending large language models for speech and audio captioning C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
7 2024 Can Large Language Models Understand Spatial Audio? C Tang, W Yu, G Sun, X Chen, T Tan, W Li, J Zhang, L Lu, Z Ma, Y Wang, ...
arXiv preprint arXiv:2406.07914, 2024
3 2024 An Improved Empirical Fisher Approximation for Natural Gradient Descent X Wu, W Yu, C Zhang, P Woodland
arXiv preprint arXiv:2406.06420, 2024
2 2024 Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation S Wang, W Yu, Y Yang, C Tang, Y Li, J Zhuang, X Chen, X Tian, J Zhang, ...
arXiv preprint arXiv:2409.16644, 2024
1 2024 Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement Y Yang, Z Liu, W Yu, G Sun, Q Kong, C Zhang
arXiv preprint arXiv:2409.09642, 2024
1 2024 M AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset Z Chen, H Liu, W Yu, G Sun, H Liu, J Wu, C Zhang, Y Wang, Y Wang
arXiv preprint arXiv:2403.14168, 2024
1 2024