Weakly supervised object localization and detection: A survey

D Zhang, J Han, G Cheng… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
As an emerging and challenging problem in the computer vision community, weakly
supervised object localization and detection plays an important role for develo** new …

Advances in deep concealed scene understanding

DP Fan, GP Ji, P Xu, MM Cheng, C Sakaridis… - Visual Intelligence, 2023 - Springer
Concealed scene understanding (CSU) is a hot computer vision topic aiming to perceive
objects exhibiting camouflage. The current boom in terms of techniques and applications …

Anydoor: Zero-shot object-level image customization

X Chen, L Huang, Y Liu, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com
This work presents AnyDoor a diffusion-based image generator with the power to teleport
target objects to new scenes at user-specified locations with desired shapes. Instead of …

Tracking anything with decoupled video segmentation

HK Cheng, SW Oh, B Price… - Proceedings of the …, 2023 - openaccess.thecvf.com
Training data for video segmentation are expensive to annotate. This impedes extensions of
end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary …

Segment anything is not always perfect: An investigation of sam on different real-world applications

W Ji, J Li, Q Bi, T Liu, W Li, L Cheng - 2024 - Springer
Abstract Recently, Meta AI Research approaches a general, promptable segment anything
model (SAM) pre-trained on an unprecedentedly large segmentation dataset (SA-1B) …

Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model

HK Cheng, AG Schwing - European Conference on Computer Vision, 2022 - Springer
We present XMem, a video object segmentation architecture for long videos with unified
feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video …

Mvimgnet: A large-scale dataset of multi-view images

X Yu, M Xu, Y Zhang, H Liu, C Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com
Being data-driven is one of the most iconic properties of deep learning algorithms. The birth
of ImageNet drives a remarkable trend of" learning from large-scale data" in computer vision …

Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational visual media, 2023 - Springer
While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

Putting the object back into video object segmentation

HK Cheng, SW Oh, B Price, JY Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present Cutie a video object segmentation (VOS) network with object-level memory
reading which puts the object representation from memory back into the video object …

Visionllm v2: An end-to-end generalist multimodal large language model for hundreds of vision-language tasks

J Wu, M Zhong, S **ng, Z Lai, Z Liu… - Advances in …, 2025 - proceedings.neurips.cc
We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that
unifies visual perception, understanding, and generation within a single framework. Unlike …