Segui
Weiyun Wang
Weiyun Wang
Shanghai AI Laboratory; Fudan University
Email verificata su pjlab.org.cn
Titolo
Citata da
Citata da
Anno
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui, W Tong, K Hu, J Luo, Z Ma, ...
Science China Information Sciences 67 (12), 220101, 2024
3482024
Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language
Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen, Q Zhang, Z Lai, Y Yang, ...
arXiv preprint arXiv:2305.05662, 2023
882023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
W Wang, M Shi, Q Li, W Wang, Z Huang, L Xing, Z Chen, H Li, X Zhu, ...
The Twelfth International Conference on Learning Representations (ICLR 2024), 2023
782023
Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures
Y Duan, W Wang, Z Chen, X Zhu, L Lu, T Lu, Y Qiao, H Li, J Dai, W Wang
arXiv preprint arXiv:2403.02308, 2024
362024
Mm-interleaved: Interleaved image-text generative modeling via multi-modal feature synchronizer
C Tian, X Zhu, Y Xiong, W Wang, Z Chen, W Wang, Y Chen, L Lu, T Lu, ...
arXiv preprint arXiv:2401.10208, 2024
362024
The all-seeing project v2: Towards general relation comprehension of the open world
W Wang, Y Ren, H Luo, T Li, C Yan, Z Chen, W Wang, Q Li, L Lu, X Zhu, ...
The 18th European Conference on Computer Vision ECCV 2024, 2024
352024
Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling
Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui, J Zhu, S Ye, H Tian, Z Liu, ...
arXiv preprint arXiv:2412.05271, 2024
222024
Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity
Y Liu, Y Cao, Z Gao, W Wang, Z Chen, W Wang, H Tian, L Lu, X Zhu, T Lu, ...
Science China Information Sciences 67 (12), 1-16, 2024
152024
Demystify transformers & convolutions in modern image deep networks
X Hu, M Shi, W Wang, S Wu, L Xing, W Wang, X Zhu, L Lu, J Zhou, ...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
142024
Needle In A Multimodal Haystack
W Wang, S Zhang, Y Ren, Y Duan, T Li, S Liu, M Hu, Z Chen, K Zhang, ...
arXiv preprint arXiv:2406.07230, 2024
112024
OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Q Li, Z Chen, W Wang, W Wang, S Ye, Z Jin, G Chen, Y He, Z Gao, E Cui, ...
arXiv preprint arXiv:2406.08418, 2024
102024
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance
Z Gao, Z Chen, E Cui, Y Ren, W Wang, J Zhu, H Tian, S Ye, J He, X Zhu, ...
Visual Intelligence 2 (1), 1-17, 2024
82024
Internvl2: Better than the best—expanding performance boundaries of open-source multimodal models with the progressive scaling strategy
8*2024
Cliptext: A new paradigm for zero-shot text classification
L Qin, W Wang, Q Chen, W Che
Findings of the Association for Computational Linguistics: ACL 2023, 1077-1088, 2023
62023
Enhancing the reasoning ability of multimodal large language models via mixed preference optimization
W Wang, Z Chen, W Wang, Y Cao, Y Liu, Z Gao, J Zhu, X Zhu, L Lu, ...
arXiv preprint arXiv:2411.10442, 2024
52024
Chemvlm: Exploring the power of multimodal large language models in chemistry area
J Li, D Zhang, X Wang, Z Hao, J Lei, Q Tan, C Zhou, W Liu, Y Yang, ...
arXiv preprint arXiv:2408.07246, 2024
42024
Seeing and understanding: Bridging vision with chemical knowledge via chemvlm
J Li, D Zhang, X Wang, Z Hao, J Lei, Q Tan, C Zhou, W Liu, W Wang, ...
arXiv e-prints, arXiv: 2408.07246, 2024
22024
Il sistema al momento non può eseguire l'operazione. Riprova più tardi.
Articoli 1–17