עקוב אחר
Shizhe Chen
Shizhe Chen
INRIA Paris
כתובת אימייל מאומתת בדומיין inria.fr - דף הבית
כותרת
צוטט על ידי
צוטט על ידי
שנה
Fine-grained video-text retrieval with hierarchical graph reasoning
S Chen, Y Zhao, Q Jin, Q Wu
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020
3802020
Say as you wish: Fine-grained control of image caption generation with abstract scene graphs
S Chen, Q Jin, P Wang, Q Wu
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020
2752020
History aware multimodal transformer for vision-and-language navigation
S Chen, PL Guhur, C Schmid, I Laptev
Advances in neural information processing systems 34, 5834-5847, 2021
2412021
Speech emotion recognition with acoustic and lexical features
Q Jin, C Li, S Chen, H Wu
2015 IEEE international conference on acoustics, speech and signal …, 2015
2262015
Multimodal multi-task learning for dimensional and continuous emotion recognition
S Chen, Q Jin, J Zhao, S Wang
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, 19-26, 2017
1742017
Think global, act local: Dual-scale graph transformer for vision-and-language navigation
S Chen, PL Guhur, M Tapaswi, C Schmid, I Laptev
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
1592022
Airbert: In-domain pretraining for vision-and-language navigation
PL Guhur, M Tapaswi, S Chen, I Laptev, C Schmid
Proceedings of the IEEE/CVF international conference on computer vision …, 2021
1552021
Multi-modal dimensional emotion recognition using recurrent neural networks
S Chen, Q Jin
Proceedings of the 5th International Workshop on Audio/Visual Emotion …, 2015
1462015
WenLan: Bridging vision and language by large-scale multi-modal pre-training
Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ...
arXiv preprint arXiv:2103.06561, 2021
1452021
Elaborative rehearsal for zero-shot action recognition
S Chen, D Huang
Proceedings of the IEEE/CVF international conference on computer vision …, 2021
1262021
Describing videos using multi-modal fusion
Q Jin, J Chen, S Chen, Y Xiong, A Hauptmann
Proceedings of the 24th ACM international conference on Multimedia, 1087-1091, 2016
1192016
Instruction-driven history-aware policies for robotic manipulations
PL Guhur, S Chen, RG Pinel, M Tapaswi, I Laptev, C Schmid
Conference on Robot Learning, 175-187, 2023
1062023
Sketch, ground, and refine: Top-down dense video captioning
C Deng, S Chen, D Chen, Y He, Q Wu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
822021
Multi-modal conditional attention fusion for dimensional emotion prediction
S Chen, Q Jin
Proceedings of the 24th ACM international conference on Multimedia, 571-575, 2016
782016
Video captioning with guidance of multimodal latent topics
S Chen, J Chen, Q Jin, A Hauptmann
Proceedings of the 25th ACM international conference on Multimedia, 1838-1846, 2017
742017
Language conditioned spatial relation reasoning for 3d object grounding
S Chen, PL Guhur, M Tapaswi, C Schmid, I Laptev
Advances in neural information processing systems 35, 20522-20535, 2022
662022
Few-shot action recognition with hierarchical matching and contrastive learning
S Zheng, S Chen, Q Jin
European conference on computer vision, 297-313, 2022
602022
Multi-modal multi-cultural dimensional continues emotion recognition in dyadic interactions
J Zhao, R Li, S Chen, Q Jin
Proceedings of the 2018 on audio/visual emotion challenge and workshop, 65-72, 2018
562018
Learning from unlabeled 3d environments for vision-and-language navigation
S Chen, PL Guhur, M Tapaswi, C Schmid, I Laptev
European Conference on Computer Vision, 638-655, 2022
462022
Unpaired cross-lingual image caption generation with self-supervised rewards
Y Song, S Chen, Y Zhao, Q Jin
Proceedings of the 27th ACM international conference on multimedia, 784-792, 2019
462019
המערכת אינה יכולה לבצע את הפעולה כעת. נסה שוב מאוחר יותר.
מאמרים 1–20