Wenyi Hong

Citace

	Všechny	Od 2020
Citace	2734	2733
h-index	10	10
i10-index	10	10

1900

950

475

1425

2021202220232024202516 144 537 1813 214

Veřejný přístup

Zobrazit všechny

3 články

0 článků

dostupné

nedostupné

Vychází ze zplnomocnění pro financování

Spoluautoři

Ming DingTsinghua UniversityE-mailová adresa ověřena na: mails.tsinghua.edu.cn
Tang JieWeBank Chair Professor, Tsinghua UniversityE-mailová adresa ověřena na: tsinghua.edu.cn
Wendi ZhengPHD Student, Tsinghua UniversityE-mailová adresa ověřena na: mails.tsinghua.edu.cn
Zhuoyi YangTsinghua UniversityE-mailová adresa ověřena na: mails.tsinghua.edu.cn
Jiazheng XuTsinghua UniversityE-mailová adresa ověřena na: mails.tsinghua.edu.cn
Yuxiao DongCS, Tsinghua UniversityE-mailová adresa ověřena na: tsinghua.edu.cn

Sledovat

Wenyi Hong

Tsinghua University

E-mailová adresa ověřena na: mails.tsinghua.edu.cn

multimodal pretraining


Název Seřadit podle citací Seřadit podle roku Seřadit podle názvu	Citace Citace	Rok
Cogview: Mastering text-to-image generation via transformers M Ding, Z Yang, W Hong, W Zheng, C Zhou, D Yin, J Lin, X Zou, Z Shao, ... Advances in neural information processing systems 34, 19822-19835, 2021	803	2021
CogVLM: Visual expert for pretrained language models W Wang, Q Lv, W Yu, W Hong, J Qi, Y Wang, J Ji, Z Yang, L Zhao, X Song, ... NeurIPS 2024, 2023	560	2023
CogVideo: Large-Scale Pretraining for Text-to-Video Generation via Transformers W Hong, M Ding, W Zheng, X Liu, J Tang The Eleventh International Conference on Learning Representations (ICLR 2023), 2022	481	2022
Cogview2: Faster and better text-to-image generation via hierarchical transformers M Ding, W Zheng, W Hong, J Tang Advances in Neural Information Processing Systems 35, 16890-16902, 2022	326	2022
CogAgent: A Visual Language Model for GUI Agents W Hong, W Wang, Q Lv, J Xu, W Yu, J Ji, Y Wang, Z Wang, Y Dong, ... The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024, 2023	242	2023
Cogvideox: Text-to-video diffusion models with an expert transformer Z Yang, J Teng, W Zheng, M Ding, S Huang, J Xu, Y Yang, W Hong, ... arXiv preprint arXiv:2408.06072, 2024	183	2024
Cogvlm2: Visual language models for image and video understanding W Hong, W Wang, M Ding, W Yu, Q Lv, Y Wang, Y Cheng, S Huang, J Ji, ... arXiv preprint arXiv:2408.16500, 2024	57	2024
Lvbench: An extreme long video understanding benchmark W Wang, Z He, W Hong, Y Cheng, X Zhang, J Qi, X Gu, S Huang, B Xu, ... arXiv preprint arXiv:2406.08035, 2024	27	2024
Relay diffusion: Unifying diffusion process across resolutions for image synthesis J Teng, W Zheng, M Ding, W Hong, J Wangni, Z Yang, J Tang ICLR 2024, 2023	22	2023
Cogcom: Train large vision-language models diving into details through chain of manipulations J Qi, M Ding, W Wang, Y Bai, Q Lv, W Hong, B Xu, L Hou, J Li, Y Dong, ... arXiv preprint arXiv:2402.04236, 2024	20	2024
Visualagentbench: Towards large multimodal models as visual foundation agents X Liu, T Zhang, Y Gu, IL Iong, Y Xu, X Song, S Zhang, H Lai, X Liu, H Zhao, ... arXiv preprint arXiv:2408.06327, 2024	8	2024
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer Z Yang, H Jiang, W Hong, J Teng, W Zheng, Y Dong, M Ding, J Tang ECCV 2024, 2024	4	2024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model Z Yang, J Chen, Z Du, W Yu, W Wang, W Hong, Z Jiang, B Xu, Y Dong, ... arXiv preprint arXiv:2409.13729, 2024	1	2024
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models W Hong, Y Cheng, Z Yang, W Wang, L Wang, X Gu, S Huang, Y Dong, ... arXiv preprint arXiv:2501.02955, 2025		2025

Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.

Články 1–14

Citace za rok

Duplicitní citace

Sloučené citace

Přidat spoluautorySpoluautoři

Sledovat

Citace

Spoluautoři