Generative multimodal models are in-context learners Q Sun, Y Cui, X Zhang, F Zhang, Q Yu, Y Wang, Y Rao, J Liu, T Huang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 198 | 2024 |
Generative Pretraining in Multimodality Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang, ... The Twelfth International Conference on Learning Representations, 2023 | 165 | 2023 |
Efficient multimodal learning from data-centric perspective M He, Y Liu, B Wu, J Yuan, Y Wang, T Huang, B Zhao arXiv preprint arXiv:2402.11530, 2024 | 81 | 2024 |
Emu3: Next-token prediction is all you need X Wang, X Zhang, Z Luo, Q Sun, Y Cui, J Wang, F Zhang, Y Wang, Z Li, ... arXiv preprint arXiv:2409.18869, 2024 | 68 | 2024 |
Emu: Generative pretraining in multimodality Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang, ... The Twelfth International Conference on Learning Representations, 2023 | 59 | 2023 |
Fine-grained visual prompting L Yang, Y Wang, X Li, X Wang, J Yang Advances in Neural Information Processing Systems 36, 2024 | 54 | 2024 |
Omnigen: Unified image generation S Xiao, Y Wang, J Zhou, H Yuan, X Xing, R Yan, S Wang, T Huang, Z Liu arXiv preprint arXiv:2409.11340, 2024 | 25 | 2024 |
DSMENet: Detail and structure mutually enhancing network for under-sampled MRI reconstruction Y Wang, Y Pang, C Tong Computers in Biology and Medicine 154, 106204, 2023 | 20 | 2023 |
Densefusion-1m: Merging vision experts for comprehensive multimodal perception X Li, F Zhang, H Diao, Y Wang, X Wang, LY Duan arXiv preprint arXiv:2407.08303, 2024 | 16 | 2024 |
Unveiling Encoder-Free Vision-Language Models H Diao, Y Cui, X Li, Y Wang, H Lu, X Wang arXiv preprint arXiv:2406.11832, 2024 | 13 | 2024 |
Universal prompt optimizer for safe text-to-image generation Z Wu, H Gao, Y Wang, X Zhang, S Wang Proceedings of the 2024 Conference of the North American Chapter of the …, 2024 | 12 | 2024 |
HIWDNet: a hybrid image-wavelet domain network for fast magnetic resonance image reconstruction C Tong, Y Pang, Y Wang Computers in Biology and Medicine 151, 105947, 2022 | 12 | 2022 |
Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions Y Liu, Z Liang, Y Wang, M He, J Li, B Zhao arXiv preprint arXiv:2406.10638, 2024 | 6 | 2024 |
Emu: Generative Pretraining in Multimodality Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang, ... URL http://arxiv. org/abs/2307.05222, 2024 | 2* | 2024 |
Generative Pretraining in Multimodality. CoRR abs/2307.05222 (2023) Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang, ... | 2 | 2023 |
Generative pretraining in multimodality (2023) Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang, H Gao, J Liu, T Huang arXiv preprint arXiv:2307.05222, 2023 | 2* | 2023 |
Generative Multimodal Models are In-Context Learners. CoRR abs/2312.13286 (2023) Q Sun, Y Cui, X Zhang, F Zhang, Q Yu, Z Luo, Y Wang, Y Rao, J Liu, ... | 2 | 2023 |
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval J Zhou, Z Liu, Z Liu, S Xiao, Y Wang, B Zhao, CJ Zhang, D Lian, Y Xiong arXiv preprint arXiv:2412.14475, 2024 | | 2024 |
Fine-Grained Visual Text Prompting L Yang, X Li, Y Wang, X Wang, J Yang IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | | 2024 |