Suivre
Wenyi Yu
Wenyi Yu
Adresse e-mail validée de mails.tsinghua.edu.cn
Titre
Citée par
Citée par
Année
Salmonn: Towards generic hearing abilities for large language models
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
arXiv preprint arXiv:2310.13289, 2023
2172023
Connecting speech encoder and large language model for asr
W Yu, C Tang, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
452024
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, Y Wang, C Zhang
arXiv preprint arXiv:2406.15704, 2024
23*2024
Extending large language models for speech and audio captioning
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
72024
Can Large Language Models Understand Spatial Audio?
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, J Zhang, L Lu, Z Ma, Y Wang, ...
arXiv preprint arXiv:2406.07914, 2024
32024
An Improved Empirical Fisher Approximation for Natural Gradient Descent
X Wu, W Yu, C Zhang, P Woodland
arXiv preprint arXiv:2406.06420, 2024
22024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
S Wang, W Yu, Y Yang, C Tang, Y Li, J Zhuang, X Chen, X Tian, J Zhang, ...
arXiv preprint arXiv:2409.16644, 2024
12024
Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement
Y Yang, Z Liu, W Yu, G Sun, Q Kong, C Zhang
arXiv preprint arXiv:2409.09642, 2024
12024
MAV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Z Chen, H Liu, W Yu, G Sun, H Liu, J Wu, C Zhang, Y Wang, Y Wang
arXiv preprint arXiv:2403.14168, 2024
12024
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–9