- Academic Search

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

Spara Citera Citerat av 218 Relaterade artiklar Alla 8 versionerna Cachad

[Free GPT-4]

[PDF] springer.com

Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer

Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

Spara Citera Citerat av 1904 Relaterade artiklar Alla 8 versionerna

[Free GPT-4]

[PDF] thecvf.com

Dual aggregation transformer for image super-resolution

Z Chen, Y Zhang, J Gu, L Kong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Transformer has recently gained considerable popularity in low-level vision tasks, including
image super-resolution (SR). These networks utilize self-attention along different …

Spara Citera Citerat av 210 Relaterade artiklar Alla 9 versionerna Se som HTML-version

[Free GPT-4]

[PDF] neurips.cc

Segnext: Rethinking convolutional attention design for semantic segmentation

MH Guo, CZ Lu, Q Hou, Z Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc

We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …

Spara Citera Citerat av 706 Relaterade artiklar Alla 6 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

Efficient multi-scale attention module with cross-spatial learning

D Ouyang, S He, G Zhang, M Luo… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Remarkable effectiveness of the channel or spatial attention mechanisms for producing
more discernible feature representation are illustrated in various computer vision tasks …

Spara Citera Citerat av 698 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]

[PDF] springer.com

Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer

While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

Spara Citera Citerat av 795 Relaterade artiklar Alla 8 versionerna

[Free GPT-4]

[PDF] arxiv.org

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arxiv preprint arxiv:2111.09734, 2021 - arxiv.org

Image captioning is a fundamental task in vision-language understanding, where the model
predicts a textual informative caption to a given input image. In this paper, we present a …

Spara Citera Citerat av 769 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]

[PDF] neurips.cc

Egocentric video-language pretraining

KQ Lin, J Wang, M Soldan, M Wray… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Video-Language Pretraining (VLP), which aims to learn transferable representation
to advance a wide range of video-text downstream tasks, has recently received increasing …

Spara Citera Citerat av 179 Relaterade artiklar Alla 8 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

Dynamic neural networks: A survey

Y Han, G Huang, S Song, L Yang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …

Spara Citera Citerat av 788 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]

[PDF] arxiv.org

Contextual object detection with multimodal large language models

Y Zang, W Li, J Han, K Zhou, CC Loy - International Journal of Computer …, 2024 - Springer

Abstract Recent Multimodal Large Language Models (MLLMs) are remarkable in vision-
language tasks, such as image captioning and question answering, but lack the essential …

Spara Citera Citerat av 79 Relaterade artiklar Alla 2 versionerna

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

The multi-modal fusion in visual question answering: a review of attention mechanisms

Attention mechanisms in computer vision: A survey

Dual aggregation transformer for image super-resolution

Segnext: Rethinking convolutional attention design for semantic segmentation

Efficient multi-scale attention module with cross-spatial learning

Visual attention network

Clipcap: Clip prefix for image captioning

Egocentric video-language pretraining

Dynamic neural networks: A survey

Contextual object detection with multimodal large language models