Task Me Anything J Zhang, W Huang, Z Ma, O Michel, D He, T Gupta, WC Ma, A Farhadi, ... NeurIPS 2024, 2024 | 53 | 2024 |
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks Z Ma, W Huang, J Zhang, T Gupta, R Krishna ECCV 2024, 2024 | 13 | 2024 |
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models J Zhang, L Xue, L Song, J Wang, W Huang, M Shu, A Yan, Z Ma, ... arXiv preprint arXiv:2412.07012, 2024 | 3* | 2024 |
Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming Z Gao, W Huang, J Zhang, A Kembhavi, R Krishna arXiv preprint arXiv:2412.08221, 2024 | | 2024 |
Taskverse: A Benchmark Generation Engine for Multi-modal Language Model J Zhang, W Huang, Z Ma, O Michel, D He, T Gupta, WC Ma, A Farhadi, ... Workshop on Video-Language Models@ NeurIPS 2024, 0 | | |