Video frame interpolation: A comprehensive survey

J Dong, K Ota, M Dong - ACM Transactions on Multimedia Computing …, 2023 - dl.acm.org
Video Frame Interpolation (VFI) is a fascinating and challenging problem in the computer
vision (CV) field, aiming to generate non-existing frames between two consecutive video …

Uniformer: Unifying convolution and self-attention for visual recognition

K Li, Y Wang, J Zhang, P Gao, G Song… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arxiv preprint arxiv …, 2020 - arxiv.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Facial expression recognition with visual transformers and attentional selective fusion

F Ma, B Sun, S Li - IEEE Transactions on Affective Computing, 2021 - ieeexplore.ieee.org
Facial Expression Recognition (FER) in the wild is extremely challenging due to occlusions,
variant head poses, face deformation and motion blur under unconstrained conditions …

Oadtr: Online action detection with transformers

X Wang, S Zhang, Z Qing, Y Shao… - Proceedings of the …, 2021 - openaccess.thecvf.com
Most recent approaches for online action detection tend to apply Recurrent Neural Network
(RNN) to capture long-range temporal structure. However, RNN suffers from non-parallelism …

A transformer-based feature segmentation and region alignment method for UAV-view geo-localization

M Dai, J Hu, J Zhuang, E Zheng - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Cross-view geo-localization is a task of matching the same geographic image from different
views, eg, unmanned aerial vehicle (UAV) and satellite. The most difficult challenges are the …

Neural video depth stabilizer

Y Wang, M Shi, J Li, Z Huang, Z Cao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Video depth estimation aims to infer temporally consistent depth. Some methods achieve
temporal consistency by finetuning a single-image depth model during test time using …

CCTNet: Coupled CNN and transformer network for crop segmentation of remote sensing images

H Wang, X Chen, T Zhang, Z Xu, J Li - Remote Sensing, 2022 - mdpi.com
Semantic segmentation by using remote sensing images is an efficient method for
agricultural crop classification. Recent solutions in crop segmentation are mainly deep …

Spike transformer: Monocular depth estimation for spiking camera

J Zhang, L Tang, Z Yu, J Lu, T Huang - European Conference on Computer …, 2022 - Springer
Spiking camera is a bio-inspired vision sensor that mimics the sampling mechanism of the
primate fovea, which has shown great potential for capturing high-speed dynamic scenes …