NTIRE 2023 challenge on efficient super-resolution: Methods and results

Y Li, Y Zhang, R Timofte, L Van Gool… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper reviews the NTIRE 2023 challenge on efficient single-image super-resolution
with a focus on the proposed solutions and results. The aim of this challenge is to devise a …

Visual chatgpt: Talking, drawing and editing with visual foundation models

C Wu, S Yin, W Qi, X Wang, Z Tang, N Duan - arxiv preprint arxiv …, 2023 - arxiv.org
ChatGPT is attracting a cross-field interest as it provides a language interface with
remarkable conversational competency and reasoning capabilities across many domains …

[HTML][HTML] The segment anything model (sam) for remote sensing applications: From zero to one shot

LP Osco, Q Wu, EL De Lemos, WN Gonçalves… - International Journal of …, 2023 - Elsevier
Segmentation is an essential step for remote sensing image processing. This study aims to
advance the application of the Segment Anything Model (SAM), an innovative image …

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - European conference on …, 2022 - Springer
Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

Fine-tuned clip models are efficient video learners

H Rasheed, MU Khattak, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large-scale multi-modal training with image-text pairs imparts strong generalization to CLIP
model. Since training on a similar scale for videos is infeasible, recent approaches focus on …

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022 - proceedings.neurips.cc
Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …

Davit: Dual attention vision transformers

M Ding, B **ao, N Codella, P Luo, J Wang… - European conference on …, 2022 - Springer
In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective
vision transformer architecture that is able to capture global context while maintaining …

Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

R Zhang, Z Guo, P Gao, R Fang… - Advances in neural …, 2022 - proceedings.neurips.cc
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …

Dilateformer: Multi-scale dilated transformer for visual recognition

J Jiao, YM Tang, KY Lin, Y Gao, AJ Ma… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
As a de facto solution, the vanilla Vision Transformers (ViTs) are encouraged to model long-
range dependencies between arbitrary image patches while the global attended receptive …