From recognition to cognition: Visual commonsense reasoning
Visual understanding goes well beyond object recognition. With one glance at an image, we
can effortlessly imagine the world beyond the pixels: for instance, we can infer people's …
can effortlessly imagine the world beyond the pixels: for instance, we can infer people's …
Swag: A large-scale adversarial dataset for grounded commonsense inference
R Zellers, Y Bisk, R Schwartz, Y Choi - ar** down U-net for segmentation of biomedical images on platforms with low computational budgets
During image segmentation tasks in computer vision, achieving high accuracy performance
while requiring fewer computations and faster inference is a big challenge. This is especially …
while requiring fewer computations and faster inference is a big challenge. This is especially …
Procedure planning in instructional videos
In this paper, we study the problem of procedure planning in instructional videos, which can
be seen as a step towards enabling autonomous agents to plan for complex tasks in …
be seen as a step towards enabling autonomous agents to plan for complex tasks in …
Event-guided procedure planning from instructional videos with text supervision
In this work, we focus on the task of procedure planning from instructional videos with text
supervision, where a model aims to predict an action sequence to transform the initial visual …
supervision, where a model aims to predict an action sequence to transform the initial visual …
Egodistill: Egocentric head motion distillation for efficient video understanding
Recent advances in egocentric video understanding models are promising, but their heavy
computational expense is a barrier for many real-world applications. To address this …
computational expense is a barrier for many real-world applications. To address this …