Detrs with collaborative hybrid assignments training Z Zong, G Song, Y Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 387 | 2023 |
Raphael: Text-to-image generation via large mixture of diffusion paths Z Xue, G Song, Q Guo, B Liu, Z Zong, Y Liu, P Luo Advances in Neural Information Processing Systems 36, 41693-41706, 2023 | 135 | 2023 |
Graph attention based proposal 3d convnets for action detection J Li, X Liu, Z Zong, W Zhao, M Zhang, J Song Proceedings of the AAAI Conference on Artificial Intelligence 34 (04), 4626-4633, 2020 | 54 | 2020 |
Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models H Shao, S Qian, H Xiao, G Song, Z Zong, L Wang, Y Liu, H Li arXiv e-prints, arXiv: 2403.16999, 2024 | 40 | 2024 |
Mova: Adapting mixture of vision experts to multimodal context Z Zong, B Ma, D Shen, G Song, H Shao, D Jiang, H Li, Y Liu arXiv preprint arXiv:2404.13046, 2024 | 36 | 2024 |
Temporal enhanced training of multi-view 3d object detector via historical object prediction Z Zong, D Jiang, G Song, Z Xue, J Su, H Li, Y Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 31 | 2023 |
Self-slimmed vision transformer Z Zong, K Li, G Song, Y Wang, Y Qiao, B Leng, Y Liu European Conference on Computer Vision, 432-448, 2022 | 31 | 2022 |
RCNet: Reverse feature pyramid and cross-scale shift network for object detection Z Zong, Q Cao, B Leng Proceedings of the 29th ACM International Conference on Multimedia, 5637-5645, 2021 | 23 | 2021 |
Jingyong Su, Hongsheng Li, and Yu Liu. Temporal enhanced training of multi-view 3d object detector via historical object prediction Z Zong, D Jiang, G Song, Z Xue arXiv preprint arXiv:2304.00967 2, 2023 | 15 | 2023 |
Comat: Aligning text-to-image diffusion model with image-to-text concept matching D Jiang, G Song, X Wu, R Zhang, D Shen, Z Zong, Y Liu, H Li Advances in Neural Information Processing Systems 37, 76177-76209, 2025 | 14 | 2025 |
Visual cot: Advancing multi-modal language models with a comprehensive dataset and benchmark for chain-of-thought reasoning H Shao, S Qian, H Xiao, G Song, Z Zong, L Wang, Y Liu, H Li Advances in Neural Information Processing Systems 37, 8612-8642, 2025 | 12 | 2025 |
Exploring the role of large language models in prompt encoding for diffusion models B Ma, Z Zong, G Song, H Li, Y Liu arXiv preprint arXiv:2406.11831, 2024 | 12 | 2024 |
DETRs with collaborative hybrid assignments training (2023) Z Zong, G Song, Y Liu arXiv preprint arXiv:2211.12860, 0 | 7 | |
Large-batch optimization for dense visual predictions Z Xue, J Liang, G Song, Z Zong, L Chen, Y Liu, P Luo Advances in Neural Information Processing Systems 1, 2022 | 5 | 2022 |
Easyref: Omni-generalized group image reference for diffusion models via multimodal llm Z Zong, D Jiang, B Ma, G Song, H Shao, D Shen, Y Liu, H Li arXiv preprint arXiv:2412.09618, 2024 | 4 | 2024 |
Large-batch optimization for dense visual predictions: Training faster R-CNN in 4.2 minutes Z Xue, J Liang, G Song, Z Zong, L Chen, Y Liu, P Luo Advances in Neural Information Processing Systems 35, 18694-18706, 2022 | 4 | 2022 |
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping H Shao, S Wang, Y Zhou, G Song, D He, S Qin, Z Zong, B Ma, Y Liu, H Li arXiv preprint arXiv:2412.11279, 2024 | | 2024 |