VideoAgent: A Memory-Augmented Multimodal Agent for Video Understanding

Y Fan, X Ma, R Wu, Y Du, J Li, Z Gao, Q Li - European Conference on …, 2024 - Springer
We explore how reconciling several foundation models (large language models and vision-
language models) with a novel unified memory mechanism could tackle the challenging …

Advances in 3d generation: A survey

X Li, Q Zhang, D Kang, W Cheng, Y Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
Generating 3D models lies at the core of computer graphics and has been the focus of
decades of research. With the emergence of advanced neural representations and …

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Llm-seg: Bridging image segmentation and large language model reasoning

J Wang, L Ke - Proceedings of the IEEE/CVF Conference …, 2024 - openaccess.thecvf.com
Understanding human instructions to identify the target objects is vital for perception
systems. In recent years the advancements of Large Language Models (LLMs) have …

Egothink: Evaluating first-person perspective thinking capability of vision-language models

S Cheng, Z Guo, J Wu, K Fang, P Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-language models (VLMs) have recently shown promising results in traditional
downstream tasks. Evaluation studies have emerged to assess their abilities with the …

Octopi: Object property reasoning with large tactile-language models

S Yu, K Lin, A **ao, J Duan, H Soh - arxiv preprint arxiv:2405.02794, 2024 - arxiv.org
Physical reasoning is important for effective robot manipulation. Recent work has
investigated both vision and language modalities for physical reasoning; vision can reveal …

Egochoir: Capturing 3d human-object interaction regions from egocentric views

Y Yang, W Zhai, C Wang, C Yu, Y Cao… - arxiv preprint arxiv …, 2024 - arxiv.org
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-
centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric …

[HTML][HTML] Continual learning in the presence of repetition

H Hemati, L Pellegrini, X Duan, Z Zhao, F **a… - Neural Networks, 2025 - Elsevier
Continual learning (CL) provides a framework for training models in ever-evolving
environments. Although re-occurrence of previously seen objects or tasks is common in real …

Actionvos: Actions as prompts for video object segmentation

L Ouyang, R Liu, Y Huang, R Furuta, Y Sato - European Conference on …, 2024 - Springer
Delving into the realm of egocentric vision, the advancement of referring video object
segmentation (RVOS) stands as pivotal in understanding human activities. However …

EAGLE: Egocentric AGgregated Language-video Engine

J Bi, Y Tang, L Song, A Vosoughi, N Nguyen… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid evolution of egocentric video analysis brings new insights into understanding
human activities and intentions from a first-person perspective. Despite this progress, the …