Følg
Mu Cai
Mu Cai
Final-year PhD Student, University of Wisconsin-Madison
Verifisert e-postadresse på cs.wisc.edu - Startside
Tittel
Sitert av
Sitert av
År
VOS: Learning What You Don't Know by Virtual Outlier Synthesis
X Du, Z Wang, M Cai, Y Li
ICLR 2022, 2022
3472022
Masked Discrimination for Self-Supervised Learning on Point Clouds
H Liu, M Cai, YJ Lee
ECCV 2022, 2022
1652022
Investigating the catastrophic forgetting in multimodal large language models
Y Zhai, S Tong, X Li, M Cai, Q Qu, YJ Lee, Y Ma
Conference on Parsimony and Learning (CPAL) 2023, 2023
134*2023
Frequency domain image translation: More photo-realistic, better identity-preserving
M Cai, H Zhang, H Huang, Q Geng, Y Li, G Huang
ICCV 2021, 2021
1002021
ViP-LLaVA: Making large multimodal models understand arbitrary visual prompts
M Cai, H Liu, SK Mustikovela, GP Meyer, Y Chai, D Park, YJ Lee
CVPR 2024, 2024
862024
Llava-prumerge: Adaptive token reduction for efficient large multimodal models
Y Shang*, M Cai*, B Xu, YJ Lee, Y Yan
arXiv preprint arXiv:2403.15388, 2024
752024
Out-of-distribution Detection via Frequency-regularized Generative Models
M Cai, Y Li
WACV (Spotlight), 2023, 2023
432023
A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance
Z Huang, A Zhou, Z Lin, M Cai, H Wang, YJ Lee
ICCV 2023, 2023
292023
Matryoshka Multimodal Models
M Cai, J Yang, J Gao, YJ Lee
ICLR 2025, 2025
212025
A Game-Theoretic Strategy-Aware Interaction Algorithm with Validation on Real Traffic Data
L Sun*, M Cai*, W Zhan, M Tomizuka
IROS 2020, 2020
202020
LLaRA: Supercharging robot learning data for vision-language policy
X Li, C Mata, J Park, K Kahatapitiya, YS Jang, J Shang, K Ranasinghe, ...
ICLR 2025, 2025
142025
An Investigation on LLMs’ Visual Understanding Ability Using SVG for Image-Text Bridging
M Cai*, Z Huang*, Y Li, H Wang, YJ Lee
WACV 2025, 2025
12*2025
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
J Zhang*, M Cai*, T Xie, YJ Lee
Findings of ACL 2024, 2024
102024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
M Cai, R Tan, J Zhang, B Zou, K Zhang, F Yao, F Zhu, J Gu, Y Zhong, ...
NeurIPS 2024 Workshop on Video-Langauge Models, 2024
9*2024
Interpolating video-llms: Toward longer-sequence lmms in a training-free manner
Y Shang, B Xu, W Kang, M Cai, Y Li, Z Wen, Z Dong, K Keutzer, YJ Lee, ...
arXiv preprint arXiv:2409.12963, 2024
52024
Yo'LLaVA: Your Personalized Language and Vision Assistant
T Nguyen, H Liu, Y Li, M Cai, U Ojha, YJ Lee
NeurIPS 2024, 2024
52024
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
B Zou*, M Cai*, J Zhang, YJ Lee
EMNLP 2024, 2024
4*2024
Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
M Cai, C Luo, YJ Lee, X Yang
IROS 2024, 2024
12024
Magma: A Foundation Model for Multimodal AI Agents
J Yang, R Tan, Q Wu, R Zheng, B Peng, Y Liang, Y Gu, M Cai, S Ye, ...
arXiv preprint arXiv:2502.13130, 2025
2025
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
J Zhang*, M Cai*, YJ Lee
arXiv preprint arXiv:2410.02763, 2024
2024
Systemet kan ikke utføre handlingen. Prøv på nytt senere.
Artikler 1–20