- Academic Search

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Enregistrer Citer Cité 44 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] openreview.net

Unified-io: A unified model for vision, language, and multi-modal tasks

J Lu, C Clark, R Zellers, R Mottaghi… - The Eleventh …, 2022 - openreview.net

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …

Enregistrer Citer Cité 403 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Multi-modal knowledge graph construction and application: A survey

X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org

Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …

Enregistrer Citer Cité 197 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Clip-event: Connecting text and images with event structures

M Li, R Xu, S Wang, L Zhou, X Lin… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Vision-language (V+ L) pretraining models have achieved great success in
supporting multimedia applications by understanding the alignments between images and …

Enregistrer Citer Cité 142 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Going beyond nouns with vision & language models using synthetic data

P Cascante-Bonilla, K Shehada… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large-scale pre-trained Vision & Language (VL) models have shown remarkable
performance in many applications, enabling replacing a fixed set of supported classes with …

Enregistrer Citer Cité 43 fois Autres articles Les 12 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Teaching structured vision & language concepts to vision & language models

S Doveh, A Arbelle, S Harary… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision and Language (VL) models have demonstrated remarkable zero-shot performance in
a variety of tasks. However, some aspects of complex language understanding still remain a …

Enregistrer Citer Cité 71 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Dense and aligned captions (dac) promote compositional reasoning in vl models

S Doveh, A Arbelle, S Harary… - Advances in …, 2023 - proceedings.neurips.cc

Vision and Language (VL) models offer an effective method for aligning representation
spaces of images and text allowing for numerous applications such as cross-modal retrieval …

Enregistrer Citer Cité 42 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Vl-checklist: Evaluating pre-trained vision-language models with objects, attributes and relations

T Zhao, T Zhang, M Zhu, H Shen, K Lee, X Lu… - arxiv preprint arxiv …, 2022 - arxiv.org

Vision-Language Pretraining (VLP) models have recently successfully facilitated many cross-
modal downstream tasks. Most existing works evaluated their systems by comparing the fine …

Enregistrer Citer Cité 89 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

VALSE: A task-independent benchmark for vision and language models centered on linguistic phenomena

L Parcalabescu, M Cafagna, L Muradjan… - arxiv preprint arxiv …, 2021 - arxiv.org

We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark
designed for testing general-purpose pretrained vision and language (V&L) models for their …

Enregistrer Citer Cité 95 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Learning transferable human-object interaction detector with natural language supervision

S Wang, Y Duan, H Ding, YP Tan… - Proceedings of the …, 2022 - openaccess.thecvf.com

It is difficult to construct a data collection including all possible combinations of human
actions and interacting objects due to the combinatorial nature of human-object interactions …

Enregistrer Citer Cité 61 fois Autres articles Les 4 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Grounded situation recognition

Knowledge graphs meet multi-modal learning: A comprehensive survey

Unified-io: A unified model for vision, language, and multi-modal tasks

Multi-modal knowledge graph construction and application: A survey

Clip-event: Connecting text and images with event structures

Going beyond nouns with vision & language models using synthetic data

Teaching structured vision & language concepts to vision & language models

Dense and aligned captions (dac) promote compositional reasoning in vl models

Vl-checklist: Evaluating pre-trained vision-language models with objects, attributes and relations

VALSE: A task-independent benchmark for vision and language models centered on linguistic phenomena

Learning transferable human-object interaction detector with natural language supervision