Подписаться
Yinan He
Yinan He
Shanghai Al Laboratory
Подтвержден адрес электронной почты в домене pjlab.org.cn
Название
Процитировано
Процитировано
Год
Videochat: Chat-centric video understanding
KC Li, Y He, Y Wang, Y Li, W Wang, P Luo, Y Wang, L Wang, Y Qiao
arXiv preprint arXiv:2305.06355, 2023
5902023
Videomae v2: Scaling video masked autoencoders with dual masking
L Wang, B Huang, Z Zhao, Z Tong, Y He, Y Wang, Y Wang, Y Qiao
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
3862023
Internvideo: General video foundation models via generative and discriminative learning
Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ...
arXiv preprint arXiv:2212.03191, 2022
3312022
Mvbench: A comprehensive multi-modal video understanding benchmark
K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
2592024
Lavie: High-quality video generation with cascaded latent diffusion models
Y Wang, X Chen, X Ma, S Zhou, Z Huang, Y Wang, C Yang, Y He, J Yu, ...
International Journal of Computer Vision, 1-20, 2024
2242024
Vbench: Comprehensive benchmark suite for video generative models
Z Huang, Y He, J Yu, F Zhang, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
2242024
Internvid: A large-scale video-text dataset for multimodal understanding and generation
Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ...
arXiv preprint arXiv:2307.06942, 2023
2212023
Forgerynet: A versatile benchmark for comprehensive forgery analysis
Y He, B Gan, S Chen, Y Zhou, G Yin, L Song, L Sheng, J Shao, Z Liu
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021
1602021
Videomamba: State space model for efficient video understanding
K Li, X Li, Y Wang, Y He, Y Wang, L Wang, Y Qiao
European Conference on Computer Vision, 237-255, 2024
1502024
Unmasked teacher: Towards training-efficient video foundation models
K Li, Y Wang, Y Li, Y Wang, Y He, L Wang, Y Qiao
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
1442023
Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer
K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao
arXiv preprint arXiv:2211.09552, 2022
1292022
Internvideo2: Scaling foundation models for multimodal video understanding
Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei, R Zheng, Z Wang, Y Shi, ...
European Conference on Computer Vision, 396-416, 2024
1192024
Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language
Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen, Q Zhang, Z Lai, Y Yang, ...
arXiv preprint arXiv:2305.05662, 2023
902023
Uniformerv2: Unlocking the potential of image vits for video understanding
K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
502023
Internvideo-ego4d: A pack of champion solutions to ego4d challenges
G Chen, S Xing, Z Chen, Y Wang, K Li, Y Li, Y Liu, J Wang, YD Zheng, ...
arXiv preprint arXiv:2211.09529, 2022
432022
Intern: A new learning paradigm towards general vision
J Shao, S Chen, Y Li, K Wang, Z Yin, Y He, J Teng, Q Sun, M Gao, J Liu, ...
arXiv preprint arXiv:2111.08687, 2021
372021
From gpt-4 to gemini and beyond: Assessing the landscape of mllms on generalizability, trustworthiness and causality through four modalities
C Lu, C Qian, G Zheng, H Fan, H Gao, J Zhang, J Shao, J Deng, J Fu, ...
arXiv preprint arXiv:2401.15071, 2024
142024
OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Q Li, Z Chen, W Wang, W Wang, S Ye, Z Jin, G Chen, Y He, Z Gao, E Cui, ...
arXiv preprint arXiv:2406.08418, 2024
92024
X-learner: Learning cross sources and tasks for universal visual representation
Y He, G Huang, S Chen, J Teng, K Wang, Z Yin, L Sheng, Z Liu, Y Qiao, ...
European Conference on Computer Vision, 509-528, 2022
72022
Vbench++: Comprehensive and versatile benchmark suite for video generative models
Z Huang, F Zhang, X Xu, Y He, J Yu, Z Dong, Q Ma, N Chanpaisit, C Si, ...
arXiv preprint arXiv:2411.13503, 2024
62024
В данный момент система не может выполнить эту операцию. Повторите попытку позднее.
Статьи 1–20