Следене
Siyuan Huang
Siyuan Huang
Shanghai AI Lab && SJTU && MMLab CUHK
Потвърден имейл адрес: sjtu.edu.cn - Начална страница
Заглавие
Позовавания
Позовавания
Година
Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models
Z Lin, C Liu, R Zhang, P Gao, L Qiu, H Xiao, H Qiu, C Lin, W Shao, ...
arXiv preprint arXiv:2311.07575, 2023
2032023
Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models
P Xu, W Shao, K Zhang, P Gao, S Liu, M Lei, F Meng, S Huang, Y Qiao, ...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
1832024
Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
R Zhang, X Hu, B Li, S Huang, H Deng, Y Qiao, P Gao, H Li
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023
1772023
Multi-modal sensor fusion for auto driving perception: A survey
K Huang, B Shi, X Li, X Li, S Huang, Y Li
arXiv preprint arXiv:2202.02703, 2022
1482022
Instruct2act: Mapping multi-modality instructions to robotic actions with large language model
S Huang, Z Jiang, H Dong, Y Qiao, P Gao, H Li
arXiv preprint arXiv:2305.11176, 2023
1342023
Sphinx-x: Scaling data and parameters for a family of multi-modal large language models
D Liu, R Zhang, L Qiu, S Huang, W Lin, S Zhao, S Geng, Z Lin, P Jin, ...
arXiv preprint arXiv:2402.05935, 2024
1012024
Tiny lvlm-ehub: Early multimodal experiments with bard
W Shao, Y Hu, P Gao, M Lei, K Zhang, F Meng, P Xu, S Huang, H Li, ...
arXiv e-prints, arXiv: 2308.03729, 2023
342023
Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill
W Cai, S Huang, G Cheng, Y Long, P Gao, C Sun, H Dong
ICRA2024, 2023
292023
Gui odyssey: A comprehensive dataset for cross-app gui navigation on mobile devices
Q Lu, W Shao, Z Liu, F Meng, B Li, B Chen, S Huang, K Zhang, Y Qiao, ...
arXiv preprint arXiv:2406.08451, 2024
222024
Not all experts are equal: Efficient expert pruning and skipping for mixture-of-experts large language models
X Lu, Q Liu, Y Xu, A Zhou, S Huang, B Zhang, J Yan, H Li
arXiv preprint arXiv:2402.14800, 2024
222024
ManipVQA: Injecting robotic affordance and physically grounded information into multi-modal large language models
S Huang, I Ponomarenko, Z Jiang, X Li, X Hu, P Gao, H Li, H Dong
International Conference on Intelligent Robots and Systems (IROS), 2024
172024
Amex: Android multi-annotation expo dataset for mobile gui agents
Y Chai, S Huang, Y Niu, H Xiao, L Liu, D Zhang, P Gao, S Ren, H Li
arXiv preprint arXiv:2407.17490, 2024
162024
Sug: Single-dataset unified generalization for 3d point cloud classification
S Huang, B Zhang, B Shi, H Li, Y Li, P Gao
Proceedings of the 31st ACM International Conference on Multimedia, 8644-8652, 2023
122023
A3VLM: Actionable Articulation-Aware Vision Language Model
S Huang, H Chang, Y Liu, Y Zhu, H Dong, P Gao, A Boularias, H Li
Conference on Robot Learning (CoRL), 2024
102024
Draw-and-understand: Leveraging visual prompts to enable mllms to comprehend what you want
W Lin, X Wei, R An, P Gao, B Zou, Y Luo, S Huang, S Zhang, H Li
arXiv preprint arXiv:2403.20271, 2024
82024
PixWizard: Versatile image-to-image visual assistant with open-language instructions
W Lin, X Wei, R Zhang, L Zhuo, S Zhao, S Huang, J Xie, Y Qiao, P Gao, ...
arXiv preprint arXiv:2409.15278, 2024
32024
Adas: A simple active-and-adaptive baseline for cross-domain 3d semantic segmentation
B Fei, S Huang, J Yuan, B Shi, B Zhang, T Chen, M Dou, Y Qiao
arXiv preprint arXiv: 2212.10390, 2022
32022
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Q Yu, S Huang, X Yuan, Z Jiang, C Hao, X Li, H Chang, J Wang, L Liu, ...
IEEE International Conference on Robotics and Automation (ICRA), 2024
12024
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
Z Lin, D Liu, R Zhang, P Gao, L Qiu, H Xiao, H Qiu, W Shao, K Chen, ...
European Conference on Computer Vision, 36-55, 2024
12024
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
S Huang, L Chen, P Zhou, S Chen, Z Jiang, Y Hu, P Gao, H Li, M Yao, ...
arXiv preprint arXiv:2501.01895, 2025
2025
Системата не може да изпълни операцията сега. Опитайте отново по-късно.
Статии 1–20