Chattracker: Enhancing visual tracking performance via chatting with multimodal large language model

Y Sun, F Yu, S Chen, Y Zhang… - Advances in …, 2025 - proceedings.neurips.cc
Visual object tracking aims to locate a targeted object in a video sequence based on an
initial bounding box. Recently, Vision-Language~(VL) trackers have proposed to utilize …

RAGT: learning robust features for occluded human pose and shape estimation with attention-guided transformer

Z Li, Y Li, S Lin - International Conference on Computer-Aided Design …, 2023 - Springer
Abstract 3D human pose and shape estimation from monocular images is a fundamental
task in computer vision, but it is highly ill-posed and challenging due to occlusion. Occlusion …