Cogview: Mastering text-to-image generation via transformers M Ding, Z Yang, W Hong, W Zheng, C Zhou, D Yin, J Lin, X Zou, Z Shao, ... Advances in neural information processing systems 34, 19822-19835, 2021 | 803 | 2021 |
CogVLM: Visual expert for pretrained language models W Wang, Q Lv, W Yu, W Hong, J Qi, Y Wang, J Ji, Z Yang, L Zhao, X Song, ... NeurIPS 2024, 2023 | 560 | 2023 |
CogVideo: Large-Scale Pretraining for Text-to-Video Generation via Transformers W Hong, M Ding, W Zheng, X Liu, J Tang The Eleventh International Conference on Learning Representations (ICLR 2023), 2022 | 481 | 2022 |
Cogview2: Faster and better text-to-image generation via hierarchical transformers M Ding, W Zheng, W Hong, J Tang Advances in Neural Information Processing Systems 35, 16890-16902, 2022 | 326 | 2022 |
CogAgent: A Visual Language Model for GUI Agents W Hong, W Wang, Q Lv, J Xu, W Yu, J Ji, Y Wang, Z Wang, Y Dong, ... The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024, 2023 | 242 | 2023 |
Cogvideox: Text-to-video diffusion models with an expert transformer Z Yang, J Teng, W Zheng, M Ding, S Huang, J Xu, Y Yang, W Hong, ... arXiv preprint arXiv:2408.06072, 2024 | 183 | 2024 |
Cogvlm2: Visual language models for image and video understanding W Hong, W Wang, M Ding, W Yu, Q Lv, Y Wang, Y Cheng, S Huang, J Ji, ... arXiv preprint arXiv:2408.16500, 2024 | 57 | 2024 |
Lvbench: An extreme long video understanding benchmark W Wang, Z He, W Hong, Y Cheng, X Zhang, J Qi, X Gu, S Huang, B Xu, ... arXiv preprint arXiv:2406.08035, 2024 | 27 | 2024 |
Relay diffusion: Unifying diffusion process across resolutions for image synthesis J Teng, W Zheng, M Ding, W Hong, J Wangni, Z Yang, J Tang ICLR 2024, 2023 | 22 | 2023 |
Cogcom: Train large vision-language models diving into details through chain of manipulations J Qi, M Ding, W Wang, Y Bai, Q Lv, W Hong, B Xu, L Hou, J Li, Y Dong, ... arXiv preprint arXiv:2402.04236, 2024 | 20 | 2024 |
Visualagentbench: Towards large multimodal models as visual foundation agents X Liu, T Zhang, Y Gu, IL Iong, Y Xu, X Song, S Zhang, H Lai, X Liu, H Zhao, ... arXiv preprint arXiv:2408.06327, 2024 | 8 | 2024 |
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer Z Yang, H Jiang, W Hong, J Teng, W Zheng, Y Dong, M Ding, J Tang ECCV 2024, 2024 | 4 | 2024 |
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model Z Yang, J Chen, Z Du, W Yu, W Wang, W Hong, Z Jiang, B Xu, Y Dong, ... arXiv preprint arXiv:2409.13729, 2024 | 1 | 2024 |
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models W Hong, Y Cheng, Z Yang, W Wang, L Wang, X Gu, S Huang, Y Dong, ... arXiv preprint arXiv:2501.02955, 2025 | | 2025 |