Investigating local and global information for automated audio captioning with transfer learning X Xu, H Dinkel, M Wu, Z Xie, K Yu ICASSP 2021-2021 IEEE international conference on acoustics, speech and …, 2021 | 68 | 2021 |
Can audio captions be evaluated with image caption metrics? Z Zhou, Z Zhang, X Xu, Z Xie, M Wu, KQ Zhu ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 63 | 2022 |
The SJTU system for DCASE2022 challenge task 6: Audio captioning with audio-text retrieval pre-training X Xu, Z Xie, M Wu, K Yu Tech. Rep., DCASE2022 Challenge, 2022 | 37 | 2022 |
The SJTU system for DCASE2021 challenge task 6: Audio captioning based on encoder pre-training and reinforcement learning X Xu, Z Xie, M Wu, K Yu Proc. Conf. Detection Classification Acoust. Scenes Events, 1-4, 2021 | 18 | 2021 |
Beyond the status quo: A contemporary survey of advances and challenges in audio captioning X Xu, Z Xie, M Wu, K Yu IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 | 15 | 2023 |
Blat: Bootstrapping language-audio pre-training based on audioset tag-guided synthetic data X Xu, Z Zhang, Z Zhou, P Zhang, Z Xie, M Wu, KQ Zhu Proceedings of the 31st ACM International Conference on Multimedia, 2756-2764, 2023 | 14 | 2023 |
Enhance temporal relations in audio captioning with sound event detection Z Xie, X Xu, M Wu, K Yu arXiv preprint arXiv:2306.01533, 2023 | 12 | 2023 |
Picoaudio: Enabling precise timestamp and frequency controllability of audio events in text-to-audio generation Z Xie, X Xu, Z Wu, M Wu arXiv preprint arXiv:2407.02869, 2024 | 11 | 2024 |
A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds X Xu, X Xu, Z Xie, P Zhang, M Wu, K Yu ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 5 | 2024 |
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset Z Xie, X Xu, Z Wu, M Wu arXiv preprint arXiv:2407.02857, 2024 | 4 | 2024 |
Enhancing Audio Generation Diversity with Visual Information Z Xie, B Li, X Xu, M Wu, K Yu ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 4 | 2024 |
FakeSound: Deepfake General Audio Detection Z Xie, B Li, X Xu, Z Liang, K Yu, M Wu arXiv preprint arXiv:2406.08052, 2024 | 3 | 2024 |
Phonetic and Lexical Discovery of a Canine Language using HuBERT X Li, S Wang, Z Xie, M Wu, KQ Zhu arXiv preprint arXiv:2402.15985, 2024 | 1 | 2024 |
The X-LANCE system for DCASE2023 challenge task 7: Foley sound synthesis track b Z Xie, X Xu, B Li, M Wu, K Yu Tech. Rep., June, 2023 | 1 | 2023 |
Overview of the Amphion Toolkit (v0. 2) J Li, X Zhang, Y Wang, H He, C Wang, L Wang, H Liao, J Ao, Z Xie, ... arXiv preprint arXiv:2501.15442, 2025 | | 2025 |
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation B Li, Z Xie, X Xu, Y Guo, M Yan, J Zhang, K Yu, M Wu arXiv preprint arXiv:2407.13198, 2024 | | 2024 |
Improving Audio Caption Fluency with Automatic Error Correction H Zhang, Z Xie, X Xu, M Wu, K Yu arXiv preprint arXiv:2306.10090, 2023 | | 2023 |