Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion
Multi-modality (MM) image fusion aims to render fused images that maintain the merits of
different modalities, eg, functional highlight and detailed textures. To tackle the challenge in …
different modalities, eg, functional highlight and detailed textures. To tackle the challenge in …
Srformer: Permuted self-attention for single image super-resolution
Previous works have shown that increasing the window size for Transformer-based image
super-resolution models (eg, SwinIR) can significantly improve the model performance but …
super-resolution models (eg, SwinIR) can significantly improve the model performance but …
Vrt: A video restoration transformer
Video restoration aims to restore high-quality frames from low-quality frames. Different from
single image restoration, video restoration generally requires to utilize temporal information …
single image restoration, video restoration generally requires to utilize temporal information …
Recurrent video restoration transformer with guided deformable attention
Video restoration aims at restoring multiple high-quality frames from multiple low-quality
frames. Existing video restoration methods generally fall into two extreme cases, ie, they …
frames. Existing video restoration methods generally fall into two extreme cases, ie, they …
A survey on generative ai and llm for video generation, understanding, and streaming
This paper offers an insightful examination of how currently top-trending AI technologies, ie,
generative artificial intelligence (Generative AI) and large language models (LLMs), are …
generative artificial intelligence (Generative AI) and large language models (LLMs), are …
Video event restoration based on keyframes for video anomaly detection
Video anomaly detection (VAD) is a significant computer vision problem. Existing deep
neural network (DNN) based VAD methods mostly follow the route of frame reconstruction or …
neural network (DNN) based VAD methods mostly follow the route of frame reconstruction or …
Cycmunet+: Cycle-projected mutual learning for spatial-temporal video super-resolution
Spatial-Temporal Video Super-Resolution (ST-VSR) aims to generate high-quality videos
with higher resolution (HR) and higher frame rate (HFR). Quite intuitively, pioneering two …
with higher resolution (HR) and higher frame rate (HFR). Quite intuitively, pioneering two …
Spherical space feature decomposition for guided depth map super-resolution
Guided depth map super-resolution (GDSR), as a hot topic in multi-modal image processing,
aims to upsample low-resolution (LR) depth maps with additional information involved in …
aims to upsample low-resolution (LR) depth maps with additional information involved in …
TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving
Vision-centric joint perception and prediction (PnP) has become an emerging trend in
autonomous driving research. It predicts the future states of the traffic participants in the …
autonomous driving research. It predicts the future states of the traffic participants in the …
Constructing holistic spatio-temporal scene graph for video semantic role labeling
As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …