High-fidelity document stain removal via a large-scale real-world dataset and a memory-augmented transformer

M Li, H Sun, Y Lei, X Zhang, Y Dong, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Document images are often degraded by various stains, significantly impacting their
readability and hindering downstream applications such as document digitization and …

Test-time intensity consistency adaptation for shadow detection

L Zhu, W Liu, X Chen, Z Li, X Chen, Z Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Shadow detection is crucial for accurate scene understanding in computer vision, yet it is
challenged by the diverse appearances of shadows caused by variations in illumination …

Beyond model adaptation at test time: A survey

Z **ao, CGM Snoek - arxiv preprint arxiv:2411.03687, 2024 - arxiv.org
Machine learning algorithms have achieved remarkable success across various disciplines,
use cases and applications, under the prevailing assumption that training and test samples …

Adaptive query prompting for multi-domain landmark detection

Q Wei, G Huang, X Yuan, X Chen, G Zhong… - arxiv preprint arxiv …, 2024 - arxiv.org
Medical landmark detection is crucial in various medical imaging modalities and
procedures. Although deep learning-based methods have achieve promising performance …

Forgeryttt: Zero-shot image manipulation localization with test-time training

W Liu, X Shen, CM Pun, X Cun - arxiv preprint arxiv:2410.04032, 2024 - arxiv.org
Social media is increasingly plagued by realistic fake images, making it hard to trust content.
Previous algorithms to detect these fakes often fail in new, real-world scenarios because …

Uwformer: Underwater image enhancement via a semi-supervised multi-scale transformer

W Chen, Y Lei, S Luo, Z Zhou, M Li… - 2024 International Joint …, 2024 - ieeexplore.ieee.org
Underwater images often exhibit poor quality, distorted color balance and low contrast due
to the complex and intricate interplay of light, water, and objects. Despite the significant …

Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models

C Han, H Wang, J Kuang, L Zhang, J Gui - arxiv preprint arxiv:2501.13795, 2025 - arxiv.org
Existing zero-shot temporal action detection (ZSTAD) methods predominantly use fully
supervised or unsupervised strategies to recognize unseen activities. However, these …

Implicit multi-spectral transformer: An lightweight and effective visible to infrared image translation model

Y Chen, P Chen, X Zhou, Y Lei… - 2024 International Joint …, 2024 - ieeexplore.ieee.org
In the field of computer vision, visible light images often exhibit low contrast in low-light
conditions, presenting a significant challenge. While infrared imagery provides a potential …

Docdeshadower: Frequency-aware transformer for document shadow removal

Z Zhou, Y Lei, X Chen, S Luo, W Zhang… - … on Systems, Man …, 2024 - ieeexplore.ieee.org
Shadows in scanned documents pose significant challenges for document analysis and
recognition tasks due to their negative impact on visual quality and readability. Current …

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

X Bi, J Lu, B Liu, X Cun, Y Zhang, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Benefiting from large-scale pre-training of text-video pairs, current text-to-video (T2V)
diffusion models can generate high-quality videos from the text description. Besides, given …