SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents K Cheng, Q Sun, Y Chu, F Xu, Y Li, J Zhang, Z Wu ACL 2024, 2024 | 97 | 2024 |
A survey of neural code intelligence: Paradigms, advances and beyond Q Sun, Z Chen, F Xu, K Cheng, C Ma, Z Yin, J Wang, C Han, R Zhu, ... arXiv preprint arXiv:2403.14734, 2024 | 33* | 2024 |
Beyond generic: Enhancing image captioning with real-world knowledge using vision-language pre-training model K Cheng, W Song, Z Ma, W Zhu, Z Zhu, J Zhang Proceedings of the 31st ACM International Conference on Multimedia, 5038-5047, 2023 | 11 | 2023 |
Interactive evolution: A neural-symbolic self-training framework for large language models F Xu, Q Sun, K Cheng, J Liu, Y Qiao, Z Wu arXiv preprint arXiv:2406.11736, 2024 | 7 | 2024 |
Os-atlas: A foundation action model for generalist gui agents Z Wu, Z Wu, F Xu, Y Wang, Q Sun, C Jia, K Cheng, Z Ding, L Chen, ... ICLR 2025, 2024 | 6 | 2024 |
ADS-Cap: A framework for accurate and diverse stylized captioning with unpaired stylistic corpora K Cheng, Z Ma, S Zong, J Zhang, X Dai, J Chen CCF International Conference on Natural Language Processing and Chinese …, 2022 | 5 | 2022 |
Food-500 cap: A fine-grained food caption benchmark for evaluating vision-language models Z Ma, M Pan, W Wu, K Cheng, J Zhang, S Huang, J Chen Proceedings of the 31st ACM International Conference on Multimedia, 5674-5685, 2023 | 4 | 2023 |
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Q Sun*, K Cheng*, Z Ding*, C Jin*, Y Wang, F Xu, Z Wu, C Jia, L Chen, ... arXiv preprint arXiv:2412.19723, 2024 | 2 | 2024 |
Vision-language models can self-improve reasoning via reflection K Cheng*, Y Li*, F Xu, J Zhang, H Zhou, Y Liu NAACL 2025, 2024 | 2 | 2024 |
Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description M Pan, J Li, M Yu, Z Ma, K Cheng, J Zhang, J Chen arXiv preprint arXiv:2312.07294, 2023 | | 2023 |