Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding L Xue, M Gao, C Xing, R Martín-Martín, J Wu, C Xiong, R Xu, JC Niebles, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023 | 236 | 2023 |
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding L Xue, N Yu, S Zhang, J Li, R Martín-Martín, J Wu, C Xiong, R Xu, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 99 | 2023 |
Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents Z Liu, W Yao, J Zhang, L Xue, S Heinecke, R Murthy, Y Feng, Z Chen, ... arXiv preprint arXiv:2308.05960, 2023 | 75 | 2023 |
Retroformer: Retrospective large language agents with policy gradient optimization W Yao, S Heinecke, JC Niebles, Z Liu, Y Feng, L Xue, R Murthy, Z Chen, ... arXiv preprint arXiv:2308.02151, 2023 | 59 | 2023 |
X-instructblip: A framework for aligning x-modal instruction-aware representations to llms and emergent cross-modal reasoning A Panagopoulou, L Xue, N Yu, J Li, D Li, S Joty, R Xu, S Savarese, ... arXiv preprint arXiv:2311.18799, 2023 | 42 | 2023 |
xgen-mm (blip-3): A family of open large multimodal models L Xue, M Shu, A Awadalla, J Wang, A Yan, S Purushwalkam, H Zhou, ... arXiv preprint arXiv:2408.08872, 2024 | 40 | 2024 |
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens A Awadalla, L Xue, O Lo, M Shu, H Lee, EK Guha, M Jordan, S Shen, ... arXiv preprint arXiv:2406.11271, 2024 | 20 | 2024 |
Directed weighted network structure analysis of complex impedance measurements for characterizing oil-in-water bubbly flow ZK Gao, WD Dang, L Xue, SS Zhang Chaos: An Interdisciplinary Journal of Nonlinear Science 27 (3), 2017 | 15 | 2017 |
Rex: Rapid exploration and exploitation for ai agents R Murthy, S Heinecke, JC Niebles, Z Liu, L Xue, W Yao, Y Feng, Z Chen, ... arXiv preprint arXiv:2307.08962, 2023 | 8 | 2023 |
Robustness evaluation of transformer-based form field extractors via form attacks L Xue, M Gao, Z Chen, C Xiong, R Xu International Conference on Document Analysis and Recognition, 167-184, 2023 | 6 | 2023 |
xgen-mm-vid (blip-3-video): You only need 32 tokens to represent a video even in vlms MS Ryoo, H Zhou, S Kendre, C Qin, L Xue, M Shu, S Savarese, R Xu, ... arXiv preprint arXiv:2410.16267, 2024 | 5 | 2024 |
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition L Xue, M Gao, C Xing, R Martín-Martín, J Wu, C Xiong, R Xu, JC Niebles, ... | 5 | 2023 |
Docquerynet: Value retrieval with arbitrary queries for form-like documents M Gao, L Xue, C Ramaiah, C Xing, R Xu, C Xiong Proceedings of the 29th International Conference on Computational …, 2022 | 5* | 2022 |
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations C Qin, C Xia, K Ramakrishnan, M Ryoo, L Tu, Y Feng, M Shu, H Zhou, ... arXiv preprint arXiv:2408.12590, 2024 | 2 | 2024 |
Image analysis based document processing for inference of key-value pairs in non-fixed digital documents M Gao, C Zeyuan, L Xue, R Xu, C Xiong US Patent 11,699,297, 2023 | 2 | 2023 |
Model-Agnostic Hierarchical Attention for 3D Object Detection M Shu, L Xue, N Yu, R Martín-Martín, JC Niebles, C Xiong, R Xu arXiv preprint arXiv:2301.02650, 2023 | 2 | 2023 |
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models J Zhang, L Xue, L Song, J Wang, W Huang, M Shu, A Yan, Z Ma, ... arXiv preprint arXiv:2412.07012, 2024 | 1 | 2024 |
X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning A Panagopoulou, L Xue, N Yu, J Li, D Li, S Joty, R Xu, S Savarese, ... European Conference on Computer Vision, 177-197, 2024 | 1 | 2024 |
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions A Awadalla, L Xue, M Shu, A Yan, J Wang, S Purushwalkam, S Shen, ... arXiv preprint arXiv:2411.07461, 2024 | | 2024 |
Systems and methods for multi-modal language models A Panagopoulou, L Xue, N Yu, LI Junnan, D Li, S Savarese, SR Joty, ... US Patent App. 18/400,477, 2024 | | 2024 |