Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion

Z Zhao, H Bai, J Zhang, Y Zhang, S Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multi-modality (MM) image fusion aims to render fused images that maintain the merits of
different modalities, eg, functional highlight and detailed textures. To tackle the challenge in …

Srformer: Permuted self-attention for single image super-resolution

Y Zhou, Z Li, CL Guo, S Bai… - Proceedings of the …, 2023 - openaccess.thecvf.com
Previous works have shown that increasing the window size for Transformer-based image
super-resolution models (eg, SwinIR) can significantly improve the model performance but …

Vrt: A video restoration transformer

J Liang, J Cao, Y Fan, K Zhang… - … on Image Processing, 2024 - ieeexplore.ieee.org
Video restoration aims to restore high-quality frames from low-quality frames. Different from
single image restoration, video restoration generally requires to utilize temporal information …

Recurrent video restoration transformer with guided deformable attention

J Liang, Y Fan, X **ang, R Ranjan… - Advances in …, 2022 - proceedings.neurips.cc
Video restoration aims at restoring multiple high-quality frames from multiple low-quality
frames. Existing video restoration methods generally fall into two extreme cases, ie, they …

A survey on generative ai and llm for video generation, understanding, and streaming

P Zhou, L Wang, Z Liu, Y Hao, P Hui, S Tarkoma… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper offers an insightful examination of how currently top-trending AI technologies, ie,
generative artificial intelligence (Generative AI) and large language models (LLMs), are …

Video event restoration based on keyframes for video anomaly detection

Z Yang, J Liu, Z Wu, P Wu, X Liu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Video anomaly detection (VAD) is a significant computer vision problem. Existing deep
neural network (DNN) based VAD methods mostly follow the route of frame reconstruction or …

Cycmunet+: Cycle-projected mutual learning for spatial-temporal video super-resolution

M Hu, K Jiang, Z Wang, X Bai… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Spatial-Temporal Video Super-Resolution (ST-VSR) aims to generate high-quality videos
with higher resolution (HR) and higher frame rate (HFR). Quite intuitively, pioneering two …

Spherical space feature decomposition for guided depth map super-resolution

Z Zhao, J Zhang, X Gu, C Tan, S Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Guided depth map super-resolution (GDSR), as a hot topic in multi-modal image processing,
aims to upsample low-resolution (LR) depth maps with additional information involved in …

TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

S Fang, Z Wang, Y Zhong, J Ge… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-centric joint perception and prediction (PnP) has become an emerging trend in
autonomous driving research. It predicts the future states of the traffic participants in the …

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Y Zhao, H Fei, Y Cao, B Li, M Zhang, J Wei… - Proceedings of the 31st …, 2023 - dl.acm.org
As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …