Learning to assemble neural module tree networks for visual grounding D Liu, H Zhang, F Wu, ZJ Zha Proceedings of the IEEE International Conference on Computer Vision, 4673-4682, 2019 | 306 | 2019 |
Learning to compose and reason with language tree structures for visual grounding R Hong, D Liu, X Mo, X He, H Zhang IEEE transactions on pattern analysis and machine intelligence 44 (2), 684-696, 2019 | 177 | 2019 |
More grounded image captioning by distilling image-text matching model Y Zhou, M Wang, D Liu, Z Hu, H Zhang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 173 | 2020 |
Context-aware visual policy network for fine-grained image captioning ZJ Zha, D Liu, H Zhang, Y Zhang, F Wu IEEE transactions on pattern analysis and machine intelligence 44 (2), 710-722, 2019 | 158 | 2019 |
Context-aware visual policy network for sequence-level image captioning D Liu, ZJ Zha, H Zhang, Y Zhang, F Wu Proceedings of the 2018 ACM on Multimedia Conference, 1416--1424, 2018 | 121 | 2018 |
Semmae: Semantic-guided masking for learning masked autoencoders G Li, H Zheng, D Liu, C Wang, B Su, C Zheng Advances in Neural Information Processing Systems 35, 14290-14302, 2022 | 115 | 2022 |
Learning to discretely compose reasoning module networks for video captioning G Tan, D Liu, M Wang, ZJ Zha Proceedings of the Twenty-Ninth International Joint Conference on Artificial …, 2020 | 83 | 2020 |
Transvg++: End-to-end visual grounding with language conditioned vision transformer J Deng, Z Yang, D Liu, T Chen, W Zhou, Y Zhang, H Li, W Ouyang IEEE transactions on pattern analysis and machine intelligence, 2023 | 60 | 2023 |
Modeling image composition for complex scene generation Z Yang, D Liu, C Wang, J Yang, D Tao Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 54 | 2022 |
Compact bidirectional transformer for image captioning Y Zhou, Z Hu, D Liu, H Ben, M Wang arXiv preprint arXiv:2201.01984, 2022 | 25 | 2022 |
Cocktail: Mixing multi-modality control for text-conditional image generation M Hu, J Zheng, D Liu, C Zheng, C Wang, D Tao, TJ Cham Thirty-seventh Conference on Neural Information Processing Systems, 2023 | 19 | 2023 |
Modeling video as stochastic processes for fine-grained video representation learning H Zhang, D Liu, Q Zheng, B Su Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 19 | 2023 |
Eliminating contextual prior bias for semantic image editing via dual-cycle diffusion Z Yang, T Chu, X Lin, E Gao, D Liu, J Yang, C Wang IEEE Transactions on Circuits and Systems for Video Technology 34 (2), 1316-1320, 2023 | 13 | 2023 |
Joint Visual Grounding with Language Scene Graphs D Liu, H Zhang, ZJ Zha, M Wang, Q Sun arXiv preprint arXiv:1906.03561, 2019 | 13* | 2019 |
Esceme: Vision-and-language navigation with episodic scene memory Q Zheng, D Liu, C Wang, J Zhang, D Wang, D Tao International Journal of Computer Vision, 1-21, 2024 | 6 | 2024 |
Exploring Temporal Concurrency for Video-Language Representation Learning H Zhang, D Liu, Z Lv, B Su, D Tao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 5 | 2023 |
Semantically-consistent dynamic blurry image generation for image deblurring Z Jing, Y Zhang, C Wang, D Liu, Y Xia Proceedings of the 30th ACM International Conference on Multimedia, 2547-2555, 2022 | 5 | 2022 |
Efficiently gluing pre-trained language and vision models for image captioning P Song, Y Zhou, X Yang, D Liu, Z Hu, D Wang, M Wang ACM Transactions on Intelligent Systems and Technology 15 (6), 1-16, 2024 | 3 | 2024 |
Cross-modal contrastive learning for robust reasoning in vqa Q Zheng 2024 7th International Conference on Pattern Recognition and Artificial …, 2024 | 1 | 2024 |
Mmot: Mixture-of-modality-tokens transformer for composed multimodal conditional image synthesis J Zheng, D Liu, C Wang, M Hu, Z Yang, C Ding, D Tao International Journal of Computer Vision, 1-29, 2024 | 1 | 2024 |