Google Académico

N Pittaras, G Giannakopoulos, P Stamatopoulos… - ACM Computing …, 2023 - dl.acm.org

This survey documents representation approaches for classification across different
modalities, from purely content-based methods to techniques utilizing external sources of …

Guardar Citar Citado por 2 Artículos relacionados Las 3 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Clip-td: Clip targeted distillation for vision-language tasks

Z Wang, N Codella, YC Chen, L Zhou, J Yang… - arxiv preprint arxiv …, 2022 - arxiv.org

Contrastive language-image pretraining (CLIP) links vision and language modalities into a
unified embedding space, yielding the tremendous potential for vision-language (VL) tasks …

Guardar Citar Citado por 28 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Camera on-boarding for person re-identification using hypothesis transfer learning

SM Ahmed, AR Lejbolle, R Panda… - Proceedings of the …, 2020 - openaccess.thecvf.com

Most of the existing approaches for person re-identification consider a static setting where
the number of cameras in the network is fixed. An interesting direction, which has received …

Guardar Citar Citado por 37 Artículos relacionados Las 12 versiones Versión en HTML

Improving visual question answering by combining scene-text information

H Sharma, AS Jalal - Multimedia Tools and Applications, 2022 - Springer

The text present in natural scenes contains semantic information about its surrounding
environment. For example, the majority of questions asked by blind people related to images …

Guardar Citar Citado por 17 Artículos relacionados Las 4 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning to respond with stickers: A framework of unifying multi-modality in multi-turn dialog

S Gao, X Chen, C Liu, L Liu, D Zhao… - Proceedings of the Web …, 2020 - dl.acm.org

Stickers with vivid and engaging expressions are becoming increasingly popular in online
messaging apps, and some works are dedicated to automatically select sticker response by …

Guardar Citar Citado por 35 Artículos relacionados Las 5 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multimodal adaptive distillation for leveraging unimodal encoders for vision-language tasks

Z Wang, N Codella, YC Chen, L Zhou, X Dai… - arxiv preprint arxiv …, 2022 - arxiv.org

Cross-modal encoders for vision-language (VL) tasks are often pretrained with carefully
curated vision-language datasets. While these datasets reach an order of 10 million …

Guardar Citar Citado por 10 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transferring domain-agnostic knowledge in video question answering

T Wu, N Garcia, M Otani, C Chu, Y Nakashima… - arxiv preprint arxiv …, 2021 - arxiv.org

Video question answering (VideoQA) is designed to answer a given question based on a
relevant video clip. The current available large-scale datasets have made it possible to …

Guardar Citar Citado por 9 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Vision to language: Methods, metrics and datasets

N Sharif, U Nadeem, SAA Shah, M Bennamoun… - … Paradigms: Advances in …, 2020 - Springer

Alan Turing's pioneering vision of machines in the 1950s, that are capable of thinking like
humans is still what Artificial Intelligence (AI) and Deep Learning research aspires to …

Guardar Citar Citado por 14 Artículos relacionados Las 6 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning to respond with your favorite stickers: A framework of unifying multi-modality and user preference in multi-turn dialog

S Gao, X Chen, L Liu, D Zhao, R Yan - ACM Transactions on Information …, 2021 - dl.acm.org

Stickers with vivid and engaging expressions are becoming increasingly popular in online
messaging apps, and some works are dedicated to automatically select sticker response by …

Guardar Citar Citado por 14 Artículos relacionados Las 4 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Decoupled box proposal and featurization with ultrafine-grained semantic labels improve image captioning and visual question answering

S Changpinyo, B Pang, P Sharma, R Soricut - arxiv preprint arxiv …, 2019 - arxiv.org

Object detection plays an important role in current solutions to vision and language tasks
like image captioning and visual question answering. However, popular models like Faster …

Guardar Citar Citado por 17 Artículos relacionados Las 9 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Transfer learning via unsupervised task discovery for visual question answering

Content-based and knowledge-enriched representations for classification across modalities: a survey

Clip-td: Clip targeted distillation for vision-language tasks

Camera on-boarding for person re-identification using hypothesis transfer learning

Improving visual question answering by combining scene-text information

Learning to respond with stickers: A framework of unifying multi-modality in multi-turn dialog

Multimodal adaptive distillation for leveraging unimodal encoders for vision-language tasks

Transferring domain-agnostic knowledge in video question answering

Vision to language: Methods, metrics and datasets

Learning to respond with your favorite stickers: A framework of unifying multi-modality and user preference in multi-turn dialog

Decoupled box proposal and featurization with ultrafine-grained semantic labels improve image captioning and visual question answering