- Academic Search

I Croitoru, SV Bogolin, M Leordeanu, H **… - Artificial Intelligence, 2025 - Elsevier

In recent years, considerable progress on the task of text-video retrieval has been achieved
by leveraging large-scale pretraining on visual and audio datasets to construct powerful …

保存引用関連記事全 4 バージョン

[Free GPT-4]

[HTML] mdpi.com

[HTML][HTML] Enabling Perspective-Aware Ai with Contextual Scene Graph Generation

D Platnick, M Alirezaie, H Rahnama - Information, 2024 - mdpi.com

This paper advances contextual image understanding within perspective-aware Ai (PAi), an
emerging paradigm in human–computer interaction that enables users to perceive and …

保存引用関連記事全 4 バージョンキャッシュ

[Free GPT-4]

[PDF] aclanthology.org

Aligning images and text with semantic role labels for fine-grained cross-modal understanding

A Bhattacharyya, C Mauceri, M Palmer… - Proceedings of the …, 2022 - aclanthology.org

As vision processing and natural language processing continue to advance, there is
increasing interest in multimodal applications, such as image retrieval, caption generation …

保存引用被引用数: 3 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection

D Herron, E Jiménez-Ruiz, G Tarroni… - arxiv preprint arxiv …, 2023 - arxiv.org

NeSy4VRD is a multifaceted resource designed to support the development of
neurosymbolic AI (NeSy) research. NeSy4VRD re-establishes public access to the images …

保存引用被引用数: 1 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] duke.edu

Bridging Facial Imagery and Vocal Reality: Stable Diffusion-Enhanced Voice Generation

Y Lin, D Liu, Y Xu, H Suo, M Li - 2024 IEEE 14th International …, 2024 - ieeexplore.ieee.org

Generating novel voices in speech synthesis is a challenging task with potential for creating
versatile voices that are needed in entertainment and research. One of the primary obstacles …

保存引用関連記事全 4 バージョン

Enhanced Dense Image Captioning Based On Transformers

T Goswami, S Potu, KP Reddy… - 2024 8th …, 2024 - ieeexplore.ieee.org

The paper introduces a pioneering work that explores the fusion of computer vision and
natural language processing for narrative generation. We propose an innovative …

保存引用関連記事

Enhance the message passing of key nodes in scene graph generation

H Qiu, Y Sun, X Luo - Proceedings of the 5th International Conference …, 2024 - dl.acm.org

Scene graph generation is an important approach in the field of visual scene understanding.
Several current studies have aimed at how to extract more robust relational features …

保存引用関連記事

Multi-view Attention Networks for Visual Question Answering

M Li, Z Bai, J Deng - 2024 6th International Conference on …, 2024 - ieeexplore.ieee.org

Visual question answering (VQA) is a typical multimodal task that necessitates a
combination of computer vision and natural language processing expertise. The …

保存引用関連記事

[Free GPT-4]

[PDF] whiterose.ac.uk

Navigating Multimodal Complexity: Advances in Model Design, Dataset Creation, and Evaluation Techniques

PGJ Vickers - 2024 - etheses.whiterose.ac.uk

Ibn Sina, a philosopher of 11th-century Persia, wrote of aFloating Man'. This man is floating
through a void, without the use of his sight or touch or any of the senses which make us …

保存引用関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neeharperi.com

[PDF][PDF] A simple technical report about the Foundational Few-Shot Object Detection Challenge

Q Chen, J Ge, W **, L Yu - neeharperi.com

A simple technical report about the Foundational Few-Shot Object Detection Challenge Page 1
Abstract A technical report on our using method on the Foundational Few Shot Object Detection …

保存引用関連記事 HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Visual genome: Connecting language and vision using crowdsourced dense image annotations....

TeachText: CrossModal text-video retrieval through generalized distillation

[HTML][HTML] Enabling Perspective-Aware Ai with Contextual Scene Graph Generation

Aligning images and text with semantic role labels for fine-grained cross-modal understanding

NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection

Bridging Facial Imagery and Vocal Reality: Stable Diffusion-Enhanced Voice Generation

Enhanced Dense Image Captioning Based On Transformers

Enhance the message passing of key nodes in scene graph generation

Multi-view Attention Networks for Visual Question Answering

Navigating Multimodal Complexity: Advances in Model Design, Dataset Creation, and Evaluation Techniques

[PDF][PDF] A simple technical report about the Foundational Few-Shot Object Detection Challenge