Uatvr: Uncertainty-adaptive text-video retrieval B Fang, W Wu, C Liu, Y Zhou, Y Song, W Wang, X Shu, X Ji, J Wang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 48 | 2023 |
GPT4Vis: what can GPT-4 do for zero-shot visual recognition? W Wu, H Yao, M Zhang, Y Song, W Ouyang, J Wang arXiv preprint arXiv:2311.15732, 2023 | 26 | 2023 |
Gratis: Deep learning graph representation with task-specific topology and multi-dimensional edge features S Song, Y Song, C Luo, Z Song, S Kuzucu, X Jia, Z Guo, W Xie, L Shen, ... arXiv preprint arXiv:2211.12482, 2022 | 26 | 2022 |
Transferring vision-language models for visual recognition: A classifier perspective W Wu, Z Sun, Y Song, J Wang, W Ouyang International Journal of Computer Vision 132 (2), 392-409, 2024 | 19 | 2024 |
Dalg: Deep attentive local and global modeling for image retrieval Y Song, R Zhu, M Yang, D He arXiv preprint arXiv:2207.00287, 2022 | 15 | 2022 |
Dense Connector for MLLMs H Yao, W Wu, T Yang, YX Song, M Zhang, H Feng, Y Sun, Z Li, W Ouyang, ... arXiv preprint arXiv:2405.13800, 2024 | 11 | 2024 |
What Can Simple Arithmetic Operations Do for Temporal Modeling? W Wu, Y Song, Z Sun, J Wang, C Xu, W Ouyang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 10 | 2023 |
It takes two: Masked appearance-motion modeling for self-supervised video transformer pre-training Y Song, M Yang, W Wu, D He, F Li, J Wang arXiv preprint arXiv:2210.05234, 2022 | 10 | 2022 |
Monoformer: One transformer for both diffusion and autoregression C Zhao, Y Song, W Wang, H Feng, E Ding, Y Sun, X Xiao, J Wang arXiv preprint arXiv:2409.16280, 2024 | 8 | 2024 |
Automated multi-level preference for mllms M Zhang, W Wu, Y Lu, Y Song, K Rong, H Yao, J Zhao, F Liu, Y Sun, ... arXiv preprint arXiv:2405.11165, 2024 | 5 | 2024 |
MERG: Multi-Dimensional Edge Representation Generation Layer for Graph Neural Networks Y Song, C Luo, A Jackson, X Jia, W Xie, L Shen, H Gunes, S Song ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 3 | 2024 |
Mulberry: Empowering mllm with o1-like reasoning and reflection via collective monte carlo tree search H Yao, J Huang, W Wu, J Zhang, Y Wang, S Liu, Y Wang, Y Song, H Feng, ... arXiv preprint arXiv:2412.18319, 2024 | 2 | 2024 |
Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction Y Hou, Q Ren, S Song, Y Song, W Wang, D Botteldooren ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 1 | 2024 |
The Key of Understanding Vision Tasks: Explanatory Instructions Y Shen, XS Wei, Y Sun, Y Song, T Yuan, J Jin, H Xu, Y Yao, E Ding arXiv preprint arXiv:2412.18525, 2024 | | 2024 |
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Y Shen, XS Wei, Y Sun, Y Song, T Yuan, J Jin, H Xu, Y Yao, E Ding arXiv e-prints, arXiv: 2412.18525, 2024 | | 2024 |
DistinctAD: Distinctive Audio Description Generation in Contexts B Fang, W Wu, Q Wu, Y Song, AB Chan arXiv preprint arXiv:2411.18180, 2024 | | 2024 |
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding YS Chuyang Zhao, YuXin Song, Junru Chen, Kang Rong, Haocheng Feng, Gang ... Advances in Neural Information Processing Systems (NeurIPS), 2024, 2024 | | 2024 |
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding C Zhao, YX Song, J Chen, K Rong, H Feng, G Zhang, S Ji, J Wang, ... The Thirty-eighth Annual Conference on Neural Information Processing Systems, 0 | | |