From recognition to cognition: Visual commonsense reasoning

R Zellers, Y Bisk, A Farhadi… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Visual understanding goes well beyond object recognition. With one glance at an image, we
can effortlessly imagine the world beyond the pixels: for instance, we can infer people's …

Swag: A large-scale adversarial dataset for grounded commonsense inference

R Zellers, Y Bisk, R Schwartz, Y Choi - ar** down U-net for segmentation of biomedical images on platforms with low computational budgets
PK Gadosey, Y Li, EA Agyekum, T Zhang, Z Liu… - Diagnostics, 2020 - mdpi.com
During image segmentation tasks in computer vision, achieving high accuracy performance
while requiring fewer computations and faster inference is a big challenge. This is especially …

Procedure planning in instructional videos

CY Chang, DA Huang, D Xu, E Adeli, L Fei-Fei… - … on Computer Vision, 2020 - Springer
In this paper, we study the problem of procedure planning in instructional videos, which can
be seen as a step towards enabling autonomous agents to plan for complex tasks in …

Event-guided procedure planning from instructional videos with text supervision

AL Wang, KY Lin, JR Du, J Meng… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this work, we focus on the task of procedure planning from instructional videos with text
supervision, where a model aims to predict an action sequence to transform the initial visual …

Egodistill: Egocentric head motion distillation for efficient video understanding

S Tan, T Nagarajan, K Grauman - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent advances in egocentric video understanding models are promising, but their heavy
computational expense is a barrier for many real-world applications. To address this …