Top-down visual attention from analysis by synthesis

B Shi, T Darrell, X Wang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Current attention algorithms (eg, self-attention) are stimulus-driven and highlight all the
salient objects in an image. However, intelligent agents like humans often guide their …

Refocus the Attention for Parameter-Efficient Thermal Infrared Object Tracking

S Lai, C Liu, D Wang, H Lu - IEEE Transactions on Neural …, 2024 - ieeexplore.ieee.org
Introducing deep trackers to thermal infrared (TIR) tracking is hampered by the scarcity of
large training datasets. To alleviate the predicament, a common approach is full fine-tuning …

Biologically Inspired Learning Model for Instructed Vision

R Abel, S Ullman - Advances in Neural Information …, 2025 - proceedings.neurips.cc
As part of the effort to understand how the brain learns, ongoing research seeks to combine
biological knowledge with current artificial intelligence (AI) modeling in an attempt to find an …

Unsupervised representation for semantic segmentation by implicit cycle-attention contrastive learning

B Pang, Y Li, Y Zhang, G Peng, J Tang, K Zha… - Proceedings of the …, 2022 - ojs.aaai.org
We study the unsupervised representation learning for the semantic segmentation task.
Different from previous works that aim at providing unsupervised pre-trained backbones for …

PGT: A progressive method for training models on long videos

B Pang, G Peng, Y Li, C Lu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Convolutional video models have an order of magnitude larger computational complexity
than their counterpart image-level models. Constrained by computational resources, there is …

Markov Progressive Framework, a Universal Paradigm for Modeling Long Videos

B Pang, G Peng, Y Li, C Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
The computational complexity of video models increases linearly with the square number of
frames. Thus, constrained bycomputational resources, training video models to learn long …

Object part parsing with hierarchical dual transformer

J Chen, J Si, N Liu, Y Wu, L Niu, C Qian - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Object part parsing involves segmenting objects into semantic parts, which has drawn great
attention recently. The current methods ignore the specific hierarchical structure of the …

Vvs: Action recognition with virtual view synthesis

G Peng, YL Li, H Zhu, J Tang, J **a… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Action recognition research is usually in the single-view setting. But human action is not
single-view based in many cases. A lot of simple action is composed of both body …

Biologically-Motivated Learning Model for Instructed Visual Processing

R Abel, S Ullman - arxiv preprint arxiv:2306.02415, 2023 - arxiv.org
As part of understanding how the brain learns, ongoing work seeks to combine biological
knowledge and current artificial intelligence (AI) modeling in an attempt to find an efficient …

[PDF][PDF] Introducing Feedback Connections for Vision Transformers

V Agarwal - newhonors.cs.umd.edu
The introduction of Transformer networks in computer vision has resulted in rapid progress
of deep models in a variety of vision tasks. These performance gains are strongly tied to the …