Emu: Generative pretraining in multimodality Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang, ... arXiv preprint arXiv:2307.05222, 2023 | 230 | 2023 |
Generative multimodal models are in-context learners Q Sun, Y Cui, X Zhang, F Zhang, Q Yu, Y Wang, Y Rao, J Liu, T Huang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 208 | 2024 |
Efficient multimodal learning from data-centric perspective M He, Y Liu, B Wu, J Yuan, Y Wang, T Huang, B Zhao arXiv preprint arXiv:2402.11530, 2024 | 88 | 2024 |
Emu3: Next-token prediction is all you need X Wang, X Zhang, Z Luo, Q Sun, Y Cui, J Wang, F Zhang, Y Wang, Z Li, ... arXiv preprint arXiv:2409.18869, 2024 | 76 | 2024 |
Fine-grained visual prompting L Yang, Y Wang, X Li, X Wang, J Yang Advances in Neural Information Processing Systems 36, 24993-25006, 2023 | 56 | 2023 |
Omnigen: Unified image generation S Xiao, Y Wang, J Zhou, H Yuan, X Xing, R Yan, S Wang, T Huang, Z Liu arXiv preprint arXiv:2409.11340, 2024 | 28 | 2024 |
Densefusion-1m: Merging vision experts for comprehensive multimodal perception X Li, F Zhang, H Diao, Y Wang, X Wang, LY Duan arXiv preprint arXiv:2407.08303, 2024 | 20 | 2024 |
DSMENet: Detail and structure mutually enhancing network for under-sampled MRI reconstruction Y Wang, Y Pang, C Tong Computers in Biology and Medicine 154, 106204, 2023 | 20 | 2023 |
Unveiling encoder-free vision-language models H Diao, Y Cui, X Li, Y Wang, H Lu, X Wang arXiv preprint arXiv:2406.11832, 2024 | 19 | 2024 |
Universal prompt optimizer for safe text-to-image generation Z Wu, H Gao, Y Wang, X Zhang, S Wang Proceedings of the 2024 Conference of the North American Chapter of the …, 2024 | 13 | 2024 |
HIWDNet: A hybrid image-wavelet domain network for fast magnetic resonance image reconstruction C Tong, Y Pang, Y Wang Computers in Biology and Medicine 151, 105947, 2022 | 12 | 2022 |
Seeing clearly, answering incorrectly: A multimodal robustness benchmark for evaluating mllms on leading questions Y Liu, Z Liang, Y Wang, M He, J Li, B Zhao arXiv preprint arXiv:2406.10638, 2024 | 7 | 2024 |
Generative Pretraining in Multimodality. CoRR abs/2307.05222 (2023) Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang, ... | 4 | 2023 |
Generative Multimodal Models are In-Context Learners. CoRR abs/2312.13286 (2023) Q Sun, Y Cui, X Zhang, F Zhang, Q Yu, Z Luo, Y Wang, Y Rao, J Liu, ... | 4 | 2023 |
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval J Zhou, Z Liu, Z Liu, S Xiao, Y Wang, B Zhao, CJ Zhang, D Lian, Y Xiong arXiv preprint arXiv:2412.14475, 2024 | 3 | 2024 |
Emu: Generative Pretraining in Multimodality Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang, ... URL http://arxiv. org/abs/2307.05222, 2024 | 3* | 2024 |
Generative pretraining in multimodality (2023) Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang arXiv preprint arXiv:2307.05222, 2023 | 2* | 2023 |
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models H Diao, X Li, Y Cui, Y Wang, H Deng, T Pan, W Wang, H Lu, X Wang arXiv preprint arXiv:2502.06788, 2025 | | 2025 |
Fine-Grained Visual Text Prompting L Yang, X Li, Y Wang, X Wang, J Yang IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | | 2024 |
Generative Pretraining in Multimodality Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang, ... The Twelfth International Conference on Learning Representations, 2023 | | 2023 |