Reasoning about actions over visual and linguistic modalities: A survey
'Actions' play a vital role in how humans interact with the world and enable them to achieve
desired goals. As a result, most common sense (CS) knowledge for humans revolves …
desired goals. As a result, most common sense (CS) knowledge for humans revolves …
Video2commonsense: Generating commonsense descriptions to enrich video captioning
Captioning is a crucial and challenging task for video understanding. In videos that involve
active agents such as humans, the agent's actions can bring about myriad changes in the …
active agents such as humans, the agent's actions can bring about myriad changes in the …
Cripp-vqa: Counterfactual reasoning about implicit physical properties via video question answering
Videos often capture objects, their visible properties, their motion, and the interactions
between different objects. Objects also have physical properties such as mass, which the …
between different objects. Objects also have physical properties such as mass, which the …
Neural constraint satisfaction: Hierarchical abstraction for combinatorial generalization in object rearrangement
Object rearrangement is a challenge for embodied agents because solving these tasks
requires generalizing across a combinatorially large set of configurations of entities and their …
requires generalizing across a combinatorially large set of configurations of entities and their …
Hierarchical abstraction for combinatorial generalization in object rearrangement
Object rearrangement is a challenge for embodied agents because solving these tasks
requires generalizing across a combinatorially large set of underlying entities that take the …
requires generalizing across a combinatorially large set of underlying entities that take the …
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Humans observe various actions being performed by other humans (physically or in
videos/images) and can draw a wide range of inferences about it beyond what they can …
videos/images) and can draw a wide range of inferences about it beyond what they can …