Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents Z Wang, S Cai, G Chen, A Liu, X Ma, Y Liang NeurIPS 2023, 2023 | 375* | 2023 |
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models Z Wang, S Cai, A Liu, Y Jin, J Hou, B Zhang, H Lin, Z He, Z Zheng, Y Yang, ... TPAMI 2024, 2023 | 79 | 2023 |
Rethinking Graph Neural Architecture Search from Message-Passing S Cai, L Li, J Deng, B Zhang, ZJ Zha, L Su, Q Huang CVPR 2021, 2021 | 74* | 2021 |
Open-World Multi-Task Control Through Goal-aware Representation Learning and Adaptive Horizon Prediction S Cai, Z Wang, X Ma, A Liu, Y Liang CVPR 2023, 2023 | 39* | 2023 |
GROOT: Learning to Follow Instructions by Watching Gameplay Videos S Cai, B Zhang, Z Wang, X Ma, A Liu, Y Liang ICLR 2024, Spotlight Presentation, 2023 | 28 | 2023 |
IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning Z Liu, J Deng, L Li, S Cai, Q Xu, S Wang, Q Huang ACM MM 2020, Oral Presentation, 2020 | 20 | 2020 |
DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editings B Li, S Cai, W Liu, P Zhang, Q He, M Hua, Z Yi WACV 2023, 2023 | 13 | 2023 |
Automatic Relation-Aware Graph Network Proliferation S Cai, L Li, X Han, J Luo, ZJ Zha, Q Huang CVPR 2022, Oral Presentation, 2022 | 11 | 2022 |
Inductive State-Relabeling Adversarial Active Learning with Heuristic Clique Rescaling B Zhang, L Li, S Wang, S Cai, ZJ Zha, Q Tian, Q Huang TPAMI 2024, 2024 | 8 | 2024 |
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Z Wang, S Cai, Z Mu, H Lin, C Zhang, X Liu, Q Li, A Liu, X Ma, Y Liang NeurIPS 2024, 2024 | 7* | 2024 |
Semantic and Correlation Disentangled Graph Convolutions for Multilabel Image Recognition S Cai, L Li, X Han, S Huang, Q Tian, Q Huang TNNLS 2023, 2023 | 7 | 2023 |
Edge-featured Graph Neural Architecture Search S Cai, L Li, X Han, Z Zha, Q Huang arXiv preprint arXiv:2109.01356, 2021 | 7 | 2021 |
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting S Cai, Z Wang, K Lian, Z Mu, X Ma, A Liu, Y Liang NeurIPS 2024 Workshop on OWA, Oral Presentation, 2024 | 2 | 2024 |
GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents S Cai, B Zhang, Z Wang, H Lin, X Ma, A Liu, Y Liang ICLR 2025, 2024 | 1* | 2024 |
MineStudio: A Streamlined Package for Minecraft AI Agent Development S Cai, Z Mu, K He, B Zhang, X Zheng, A Liu, Y Liang arXiv preprint arXiv:2412.18293, 2024 | | 2024 |
Optimizing Latent Goal by Learning from Trajectory Preference G Zhao, K Lian, H Lin, H Fu, Q Fu, S Cai, Z Wang, Y Liang arXiv preprint arXiv:2412.02125, 2024 | | 2024 |
Training Open-ended Policies to follow Video-prompt Instructions with Reinforcement Learning K He, B Zhang, Z Wang, S Cai, Q FU, H Fu, A Liu, Y Liang | | |