Learning to assemble neural module tree networks for visual grounding D Liu, H Zhang, F Wu, ZJ Zha Proceedings of the IEEE International Conference on Computer Vision, 4673-4682, 2019 | 305 | 2019 |
Learning to compose and reason with language tree structures for visual grounding R Hong, D Liu, X Mo, X He, H Zhang IEEE transactions on pattern analysis and machine intelligence 44 (2), 684-696, 2019 | 177 | 2019 |
More grounded image captioning by distilling image-text matching model Y Zhou, M Wang, D Liu, Z Hu, H Zhang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 169 | 2020 |
Context-aware visual policy network for fine-grained image captioning ZJ Zha, D Liu, H Zhang, Y Zhang, F Wu IEEE transactions on pattern analysis and machine intelligence 44 (2), 710-722, 2019 | 158 | 2019 |
Semmae: Semantic-guided masking for learning masked autoencoders G Li, H Zheng, D Liu, C Wang, B Su, C Zheng Advances in Neural Information Processing Systems 35, 14290-14302, 2022 | 126 | 2022 |
Context-aware visual policy network for sequence-level image captioning D Liu, ZJ Zha, H Zhang, Y Zhang, F Wu Proceedings of the 2018 ACM on Multimedia Conference, 1416--1424, 2018 | 123 | 2018 |
Learning to discretely compose reasoning module networks for video captioning G Tan, D Liu, M Wang, ZJ Zha Proceedings of the Twenty-Ninth International Joint Conference on Artificial …, 2020 | 82 | 2020 |
Transvg++: End-to-end visual grounding with language conditioned vision transformer J Deng, Z Yang, D Liu, T Chen, W Zhou, Y Zhang, H Li, W Ouyang IEEE transactions on pattern analysis and machine intelligence 45 (11 …, 2023 | 61 | 2023 |
Modeling image composition for complex scene generation Z Yang, D Liu, C Wang, J Yang, D Tao Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 54 | 2022 |
Compact bidirectional transformer for image captioning Y Zhou, Z Hu, D Liu, H Ben, M Wang arXiv preprint arXiv:2201.01984, 2022 | 25 | 2022 |
Cocktail: Mixing multi-modality control for text-conditional image generation M Hu, J Zheng, D Liu, C Zheng, C Wang, D Tao, TJ Cham Thirty-seventh Conference on Neural Information Processing Systems, 2023 | 20 | 2023 |
Modeling video as stochastic processes for fine-grained video representation learning H Zhang, D Liu, Q Zheng, B Su Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 20 | 2023 |
Eliminating contextual prior bias for semantic image editing via dual-cycle diffusion Z Yang, T Chu, X Lin, E Gao, D Liu, J Yang, C Wang IEEE Transactions on Circuits and Systems for Video Technology 34 (2), 1316-1320, 2023 | 18 | 2023 |
Joint Visual Grounding with Language Scene Graphs D Liu, H Zhang, ZJ Zha, M Wang, Q Sun arXiv preprint arXiv:1906.03561, 2019 | 12* | 2019 |
Esceme: Vision-and-language navigation with episodic scene memory Q Zheng, D Liu, C Wang, J Zhang, D Wang, D Tao International Journal of Computer Vision 133 (1), 254-274, 2025 | 5 | 2025 |
Exploring temporal concurrency for video-language representation learning H Zhang, D Liu, Z Lv, B Su, D Tao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 4 | 2023 |
Semantically-consistent dynamic blurry image generation for image deblurring Z Jing, Y Zhang, C Wang, D Liu, Y Xia Proceedings of the 30th ACM International Conference on Multimedia, 2547-2555, 2022 | 4 | 2022 |
Efficiently gluing pre-trained language and vision models for image captioning P Song, Y Zhou, X Yang, D Liu, Z Hu, D Wang, M Wang ACM Transactions on Intelligent Systems and Technology 15 (6), 1-16, 2024 | 3 | 2024 |
Mmot: Mixture-of-modality-tokens transformer for composed multimodal conditional image synthesis J Zheng, D Liu, C Wang, M Hu, Z Yang, C Ding, D Tao International Journal of Computer Vision 132 (9), 3537-3565, 2024 | 1 | 2024 |
Cross-Modal Contrastive Learning for Robust Reasoning in VQA Q Zheng 2024 7th International Conference on Pattern Recognition and Artificial …, 2024 | 1 | 2024 |