Object-region video transformers R Herzig, E Ben-Avraham, K Mangalam, A Bar, G Chechik, A Rohrbach, ... Proceedings of the ieee/cvf conference on computer vision and pattern …, 2022 | 99 | 2022 |
Question aware vision transformer for multimodal reasoning R Ganz, Y Kittenplon, A Aberdam, E Ben Avraham, O Nuriel, S Mazor, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 17 | 2024 |
Promptonomyvit: Multi-task prompt learning improves video transformers using synthetic scene data R Herzig, O Abramovich, E Ben Avraham, A Arbelle, L Karlinsky, A Shamir, ... Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024 | 17 | 2024 |
Bringing image scene structure to video via frame-clip consistency of object tokens E Ben Avraham, R Herzig, K Mangalam, A Bar, A Rohrbach, L Karlinsky, ... Advances in Neural Information Processing Systems 35, 26839-26855, 2022 | 16 | 2022 |
GRAM: Global reasoning for multi-page VQA T Blau, S Fogel, R Ronen, A Golts, R Ganz, E Ben Avraham, A Aberdam, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 8 | 2024 |
Structured video tokens@ ego4d pnr temporal localization challenge 2022 E Ben-Avraham, R Herzig, K Mangalam, A Bar, A Rohrbach, L Karlinsky, ... arXiv preprint arXiv:2206.07689, 2022 | 3 | 2022 |
DocVLM: Make Your VLM an Efficient Reader M Shpigel Nacson, A Aberdam, R Ganz, E Ben Avraham, A Golts, ... arXiv e-prints, arXiv: 2412.08746, 2024 | | 2024 |