Moviechat: From dense token to sparse memory for long video understanding E Song, W Chai, G Wang, Y Zhang, H Zhou, F Wu, H Chi, X Guo, T Ye, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 188 | 2024 |
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering E Song, W Chai, T Ye, JN Hwang, X Li, G Wang arXiv preprint arXiv:2404.17176, 2024 | 19 | 2024 |
Meissonic: Revitalizing masked generative transformers for efficient high-resolution text-to-image synthesis J Bai, T Ye, W Chow, E Song, QG Chen, X Li, Z Dong, L Zhu, S Yan arXiv preprint arXiv:2410.08261, 2024 | 10 | 2024 |
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark W Chai, E Song, Y Du, C Meng, V Madhavan, O Bar-Tal, JN Hwang, S Xie, ... arXiv preprint arXiv:2410.03051, 2024 | 5* | 2024 |
Devil in the Number: Towards Robust Multi-modality Data Filter Y Xu, Z Xu, W Chai, Z Zhao, E Song, G Wang arXiv preprint arXiv:2309.13770, 2023 | 3 | 2023 |
Knowledge graph extrapolation network with transductive learning for recommendation R Ma, F Guo, L Zhao, B Mei, X Bu, H Wu, E Song Applied Sciences 12 (10), 4899, 2022 | 2 | 2022 |
Fantasy: Transformer Meets Transformer in Text-to-Image Generation E Song, W Chai, X Guo, G Wang, JN Hwang, Y Lu | | 2024 |