Folgen
Xiaoqian Shen
Xiaoqian Shen
CS PhD @ KAUST
Bestätigte E-Mail-Adresse bei kaust.edu.sa - Startseite
Titel
Zitiert von
Zitiert von
Jahr
Minigpt-4: Enhancing vision-language understanding with advanced large language models
D Zhu, J Chen, X Shen, X Li, M Elhoseiny
ICLR'24, 2024
25002024
Minigpt-v2: large language model as a unified interface for vision-language multi-task learning
J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang, R Krishnamoorthi, ...
arXiv preprint arXiv:2310.09478, 2023
5212023
Chatgpt asks, blip-2 answers: Automatic questioning towards enriched visual descriptions
D Zhu, J Chen, K Haydarov, X Shen, W Zhang, M Elhoseiny
TMLR, 2024
972024
Hrs-bench: Holistic, reliable and scalable benchmark for text-to-image models
EM Bakr, X Shen, P Sun, FF Khan, LE Li, M Elhoseiny
ICCV'23, 2023
592023
Minigpt4-video: Advancing multimodal llms for video understanding with interleaved visual-textual tokens
K Ataallah, X Shen, E Abdelrahman, E Sleiman, D Zhu, J Ding, ...
CVPRW'24, 2024
472024
Mostgan-v: Video generation with temporal motion styles
X Shen, X Li, M Elhoseiny
CVPR'23, 2023
382023
Multi-ConDoS: Multimodal contrastive domain sharing generative adversarial networks for self-supervised medical image segmentation
J Zhang, S Zhang, X Shen, T Lukasiewicz, Z Xu
IEEE Transactions on Medical Imaging, 2023
342023
KeMRE: Knowledge-enhanced medical relation extraction for Chinese medicine instructions
T Qi, S Qiu, X Shen, H Chen, S Yang, H Wen, Y Zhang, Y Wu, Y Huang
Journal of Biomedical Informatics 120, 103834, 2021
212021
Exploring hierarchical graph representation for large-scale zero-shot image classification
K Yi, X Shen, Y Gou, M Elhoseiny
ECCV'22, 2022
192022
Longvu: Spatiotemporal adaptive compression for long video-language understanding
X Shen, Y Xiong, C Zhao, L Wu, J Chen, C Zhu, Z Liu, F Xiao, ...
arXiv preprint arXiv:2410.17434, 2024
162024
StoryGPT-V: Large Language Models as Consistent Story Visualizers
X Shen, M Elhoseiny
arXiv preprint arXiv:2312.02252, 2023
92023
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
K Ataallah, X Shen, E Abdelrahman, E Sleiman, M Zhuge, J Ding, D Zhu, ...
ECCV’24, 2024
52024
Affective visual dialog: A large-scale benchmark for emotional reasoning based on visually grounded conversations
K Haydarov, X Shen, A Madasu, M Salem, J Li, G Elsayed, M Elhoseiny
ECCV'24, 2024
42024
Adversarial Text to Continuous Image Generation
K Haydarov, A Muhamed, X Shen, J Lazarevic, I Skorokhodov, ...
CVPR'24, 2024
4*2024
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Z Ye, J Liu, R Peng, J Cao, Z Chen, Y Zhang, Z Xuan, M Zhou, X Shen, ...
arXiv preprint arXiv:2408.03695, 2024
2024
iMotion-LLM: Motion Prediction Instruction Tuning
A Felemban, EM Bakr, X Shen, J Ding, A Mohamed, M Elhoseiny
arXiv preprint arXiv:2406.06211, 2024
2024
EmoTalker: Audio Driven Emotion Aware Talking Head Generation
X Shen, FF Khan, M Elhoseiny
Proceedings of the Asian Conference on Computer Vision, 1900-1917, 2024
2024
ReferPix2Pix: Guiding Multi-Modal LLMs for Image Editing with Referential Pixel Grounding
X Shen, M Elhoseiny
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–18