Visual knowledge in the big model era: Retrospect and prospect

W Wang, Y Yang, Y Pan - Frontiers of Information Technology & Electronic …, 2025 - Springer
Visual knowledge is a new form of knowledge representation that can encapsulate visual
concepts and their relations in a succinct, comprehensive, and interpretable manner, with a …

Strategies to leverage foundational model knowledge in object affordance grounding

A Rai, K Buettner, A Kovashka - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
An important task for intelligent systems is affordance grounding where the goal is to locate
regions on an object where an action can be performed. Past weakly supervised …

Worldafford: Affordance grounding based on natural language instructions

C Chen, Y Cong, Z Kan - 2024 IEEE 36th International …, 2024 - ieeexplore.ieee.org
Affordance grounding aims to localize the interaction regions for the manipulated objects in
the scene image according to given instructions, which is essential for Embodied AI and …

Take A Step Back: Rethinking the Two Stages in Visual Reasoning

M Zhang, J Cai, M Liu, Y Xu, C Lu, YL Li - European Conference on …, 2024 - Springer
As a prominent research area, visual reasoning plays a crucial role in AI by facilitating
concept formation and interaction with the world. However, current works are usually carried …

Embodied AI Through Cloud-Fog Computing: A Framework for Everywhere Intelligence

D Hu, D Lan, Y Liu, J Ning, J Wang… - 2024 IEEE 33rd …, 2024 - ieeexplore.ieee.org
Embodied AI represents a crucial step towards achieving Artificial General Intelligence
(AGI). The next paradigm of Embodied AI involves physical embodiment, enhanced …

M-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

Z Chen, J Li, L Tan, Y Guo, J Liang, C Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
Intelligent robots need to interact with diverse objects across various environments. The
appearance and state of objects frequently undergo complex transformations depending on …

Coherent Physical Commonsense Reasoning in Foundational Language Models

S Storks - 2024 - deepblue.lib.umich.edu
Recent years in natural language processing (NLP) research have seen a paradigm shift
toward foundational language models (LMs), which are self-supervised, transformer-based …