Sledovat
Wenyi Hong
Wenyi Hong
E-mailová adresa ověřena na: mails.tsinghua.edu.cn
Název
Citace
Citace
Rok
Cogview: Mastering text-to-image generation via transformers
M Ding, Z Yang, W Hong, W Zheng, C Zhou, D Yin, J Lin, X Zou, Z Shao, ...
Advances in neural information processing systems 34, 19822-19835, 2021
8032021
CogVLM: Visual expert for pretrained language models
W Wang, Q Lv, W Yu, W Hong, J Qi, Y Wang, J Ji, Z Yang, L Zhao, X Song, ...
NeurIPS 2024, 2023
5602023
CogVideo: Large-Scale Pretraining for Text-to-Video Generation via Transformers
W Hong, M Ding, W Zheng, X Liu, J Tang
The Eleventh International Conference on Learning Representations (ICLR 2023), 2022
4812022
Cogview2: Faster and better text-to-image generation via hierarchical transformers
M Ding, W Zheng, W Hong, J Tang
Advances in Neural Information Processing Systems 35, 16890-16902, 2022
3262022
CogAgent: A Visual Language Model for GUI Agents
W Hong, W Wang, Q Lv, J Xu, W Yu, J Ji, Y Wang, Z Wang, Y Dong, ...
The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024, 2023
2422023
Cogvideox: Text-to-video diffusion models with an expert transformer
Z Yang, J Teng, W Zheng, M Ding, S Huang, J Xu, Y Yang, W Hong, ...
arXiv preprint arXiv:2408.06072, 2024
1832024
Cogvlm2: Visual language models for image and video understanding
W Hong, W Wang, M Ding, W Yu, Q Lv, Y Wang, Y Cheng, S Huang, J Ji, ...
arXiv preprint arXiv:2408.16500, 2024
572024
Lvbench: An extreme long video understanding benchmark
W Wang, Z He, W Hong, Y Cheng, X Zhang, J Qi, X Gu, S Huang, B Xu, ...
arXiv preprint arXiv:2406.08035, 2024
272024
Relay diffusion: Unifying diffusion process across resolutions for image synthesis
J Teng, W Zheng, M Ding, W Hong, J Wangni, Z Yang, J Tang
ICLR 2024, 2023
222023
Cogcom: Train large vision-language models diving into details through chain of manipulations
J Qi, M Ding, W Wang, Y Bai, Q Lv, W Hong, B Xu, L Hou, J Li, Y Dong, ...
arXiv preprint arXiv:2402.04236, 2024
202024
Visualagentbench: Towards large multimodal models as visual foundation agents
X Liu, T Zhang, Y Gu, IL Iong, Y Xu, X Song, S Zhang, H Lai, X Liu, H Zhao, ...
arXiv preprint arXiv:2408.06327, 2024
82024
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
Z Yang, H Jiang, W Hong, J Teng, W Zheng, Y Dong, M Ding, J Tang
ECCV 2024, 2024
42024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Z Yang, J Chen, Z Du, W Yu, W Wang, W Hong, Z Jiang, B Xu, Y Dong, ...
arXiv preprint arXiv:2409.13729, 2024
12024
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
W Hong, Y Cheng, Z Yang, W Wang, L Wang, X Gu, S Huang, Y Dong, ...
arXiv preprint arXiv:2501.02955, 2025
2025
Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.
Články 1–14