محقق Google

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022‏ - nowpublishers.com‏

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …‏

ذخیره ارجاع بیان شده در 198 یافته مقاله‌های مربوط تمام نسخه‌های 7 Library Search‏ نسخه HTML

A review of recurrent neural networks: LSTM cells and network architectures‏

Y Yu, X Si, C Hu, J Zhang - Neural computation, 2019‏ - direct.mit.edu‏

Recurrent neural networks (RNNs) have been widely adopted in research areas concerned
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …‏

ذخیره ارجاع بیان شده در 4779 یافته مقاله‌های مربوط تمام نسخه‌های 7

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Deep hierarchical semantic segmentation‏

L Li, T Zhou, W Wang, J Li… - Proceedings of the IEEE …, 2022‏ - openaccess.thecvf.com‏

Humans are able to recognize structured relations in observation, allowing us to decompose
complex scenes into simpler parts and abstract the visual world in multiple levels. However …‏

ذخیره ارجاع بیان شده در 179 یافته مقاله‌های مربوط تمام نسخه‌های 9 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Visual semantic reasoning for image-text matching‏

K Li, Y Zhang, K Li, Y Li, Y Fu - Proceedings of the IEEE …, 2019‏ - openaccess.thecvf.com‏

Image-text matching has been a hot research topic bridging the vision and language areas.
It remains challenging because the current representation of image usually lacks global …‏

ذخیره ارجاع بیان شده در 650 یافته مقاله‌های مربوط تمام نسخه‌های 9 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval‏

H Chen, G Ding, X Liu, Z Lin, J Liu… - Proceedings of the …, 2020‏ - openaccess.thecvf.com‏

Enabling bi-directional retrieval of images and texts is important for understanding the
correspondence between vision and language. Existing methods leverage the attention …‏

ذخیره ارجاع بیان شده در 434 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

Hierarchical deep click feature prediction for fine-grained image recognition‏

J Yu, M Tan, H Zhang, Y Rui… - IEEE transactions on …, 2019‏ - ieeexplore.ieee.org‏

The click feature of an image, defined as the user click frequency vector of the image on a
predefined word vocabulary, is known to effectively reduce the semantic gap for fine-grained …‏

ذخیره ارجاع بیان شده در 525 یافته مقاله‌های مربوط تمام نسخه‌های 6

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Stacked cross attention for image-text matching‏

KH Lee, X Chen, G Hua, H Hu… - Proceedings of the …, 2018‏ - openaccess.thecvf.com‏

In this paper, we study the problem of image-text matching. Inferring the latent semantic
alignment between objects or other salient stuff (eg snow, sky, lawn) and the corresponding …‏

ذخیره ارجاع بیان شده در 1456 یافته مقاله‌های مربوط تمام نسخه‌های 10 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Context-aware attention network for image-text retrieval‏

Q Zhang, Z Lei, Z Zhang, SZ Li - Proceedings of the IEEE …, 2020‏ - openaccess.thecvf.com‏

As a typical cross-modal problem, image-text bi-directional retrieval relies heavily on the
joint embedding learning and similarity measure for each image-text pair. It remains …‏

ذخیره ارجاع بیان شده در 294 یافته مقاله‌های مربوط تمام نسخه‌های 9 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Semantically self-aligned network for text-to-image part-aware person re-identification‏

Z Ding, C Ding, Z Shao, D Tao - arxiv preprint arxiv:2107.12666, 2021‏ - arxiv.org‏

Text-to-image person re-identification (ReID) aims to search for images containing a person
of interest using textual descriptions. However, due to the significant modality gap and the …‏

ذخیره ارجاع بیان شده در 163 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Camp: Cross-modal adaptive message passing for text-image retrieval‏

Z Wang, X Liu, H Li, L Sheng, J Yan… - Proceedings of the …, 2019‏ - openaccess.thecvf.com‏

Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Most previous approaches independently embed images and sentences into a joint …‏

ذخیره ارجاع بیان شده در 376 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Hierarchical multimodal lstm for dense visual-semantic embedding

Vision-language pre-training: Basics, recent advances, and future trends‏

A review of recurrent neural networks: LSTM cells and network architectures‏

Deep hierarchical semantic segmentation‏

Visual semantic reasoning for image-text matching‏

Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval‏

Hierarchical deep click feature prediction for fine-grained image recognition‏

Stacked cross attention for image-text matching‏

Context-aware attention network for image-text retrieval‏

Semantically self-aligned network for text-to-image part-aware person re-identification‏

Camp: Cross-modal adaptive message passing for text-image retrieval‏