A read-write memory network for movie story understanding S Na, S Lee, J Kim, G Kim Proceedings of the IEEE International Conference on Computer Vision, 677-685, 2017 | 132 | 2017 |
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action J Lu, C Clark, S Lee, Z Zhang, S Khosla, R Marten, D Hoiem, A Kembhavi Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 103 | 2024 |
Parameter efficient multimodal transformers for video representation learning S Lee, Y Yu, G Kim, T Breuel, J Kautz, Y Song International Conference on Learning Representations, 2021 | 94 | 2021 |
A memory network approach for story-based temporal summarization of 360 videos S Lee, J Sung, Y Yu, G Kim Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 80 | 2018 |
A deep ranking model for spatio-temporal highlight detection from a 360◦ video Y Yu, S Lee, J Na, J Kang, G Kim Proceedings of the AAAI Conference on Artificial Intelligence 32 (1), 2018 | 53 | 2018 |
Acav100m: Automatic curation of large-scale datasets for audio-visual video representation learning S Lee, J Chung, Y Yu, G Kim, T Breuel, G Chechik, Y Song Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 50 | 2021 |
Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models M Deitke, C Clark, S Lee, R Tripathi, Y Yang, JS Park, M Salehi, ... arXiv preprint arXiv:2409.17146, 2024 | 37 | 2024 |
Self-Supervised Learning of Compressed Video Representations Y Yu, S Lee, G Kim, Y Song International Conference on Learning Representations, 2021 | 19 | 2021 |
Encoding video and label priors for multi-label video classification on youtube-8m dataset S Na, Y Yu, S Lee, J Kim, G Kim arXiv preprint arXiv:1706.07960, 2017 | 14 | 2017 |
Unsupervised representation learning via neural activation coding Y Park, S Lee, G Kim, D Blei International Conference on Machine Learning, 8391-8400, 2021 | 6 | 2021 |
Can Language Models Laugh at YouTube Short-form Videos? D Ko, S Lee, G Kim Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023 | 3 | 2023 |
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation S Ha, C Kim, D Kim, J Lee, S Lee, J Lee Proceedings of the European Conference on Computer Vision (ECCV), 121-137, 2024 | | 2024 |