Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey of efficient fine-tuning methods for vision-language models—prompt and adapter
J **ng, J Liu, J Wang, L Sun, X Chen, X Gu… - Computers & Graphics, 2024 - Elsevier
Abstract Vision Language Model (VLM) is a popular research field located at the fusion of
computer vision and natural language processing (NLP). With the emergence of transformer …
computer vision and natural language processing (NLP). With the emergence of transformer …
Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
From pixels to graphs: Open-vocabulary scene graph generation with vision-language models
Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph
representation for downstream reasoning tasks. Despite recent advancements existing …
representation for downstream reasoning tasks. Despite recent advancements existing …
Graph neural networks in vision-language image understanding: a survey
Abstract 2D image understanding is a complex problem within computer vision, but it holds
the key to providing human-level scene comprehension. It goes further than identifying the …
the key to providing human-level scene comprehension. It goes further than identifying the …
OED: towards one-stage end-to-end dynamic scene graph generation
Abstract Dynamic Scene Graph Generation (DSGG) focuses on identifying visual
relationships within the spatial-temporal domain of videos. Conventional approaches often …
relationships within the spatial-temporal domain of videos. Conventional approaches often …
M3S: Scene graph driven multi-granularity multi-task learning for multi-modal NER
J Wang, Y Yang, K Liu, Z Zhu… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Multi-modal Named Entity Recognition (MNER), which mainly focuses on enhancing text-
only NER with visual information, has recently attracted considerable attention. Most current …
only NER with visual information, has recently attracted considerable attention. Most current …
Vqa-gnn: Reasoning with multimodal knowledge via graph neural networks for visual question answering
Visual question answering (VQA) requires systems to perform concept-level reasoning by
unifying unstructured (eg, the context in question and answer;" QA context") and structured …
unifying unstructured (eg, the context in question and answer;" QA context") and structured …
Multi-level knowledge-driven feature representation and triplet loss optimization network for image–text retrieval
X Qin, L Li, F Hao, M Ge, G Pang - Information Processing & Management, 2024 - Elsevier
Image–text retrieval plays a considerable role in associating vision and language. Existing
mainstream approaches focus on fine-grained alignment while ignoring the influence of …
mainstream approaches focus on fine-grained alignment while ignoring the influence of …
Multimodal event causality reasoning with scene graph enhanced interaction network
Multimodal event causality reasoning aims to recognize the causal relations based on the
given events and accompanying image pairs, requiring the model to have a comprehensive …
given events and accompanying image pairs, requiring the model to have a comprehensive …
Knowledge-embedded mutual guidance for visual reasoning
W Zheng, L Yan, L Chen, Q Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Visual reasoning between visual images and natural language is a long-standing challenge
in computer vision. Most of the methods aim to look for answers to questions only on the …
in computer vision. Most of the methods aim to look for answers to questions only on the …