From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
Gpt-4v (ision) is a human-aligned evaluator for text-to-3d generation
Despite recent advances in text-to-3D generative methods there is a notable absence of
reliable evaluation metrics. Existing metrics usually focus on a single criterion each such as …
reliable evaluation metrics. Existing metrics usually focus on a single criterion each such as …
Gpt4point: A unified framework for point-language understanding and generation
Abstract Multimodal Large Language Models (MLLMs) have excelled in 2D image-text
comprehension and image generation but their understanding of the 3D world is notably …
comprehension and image generation but their understanding of the 3D world is notably …
The Neglected Tails in Vision-Language Models
Vision-language models (VLMs) excel in zero-shot recognition but their performance varies
greatly across different visual concepts. For example although CLIP achieves impressive …
greatly across different visual concepts. For example although CLIP achieves impressive …
Benchlmm: Benchmarking cross-style visual capability of large multimodal models
Abstract Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown
remarkable capabilities in visual reasoning on data in common image styles. However, their …
remarkable capabilities in visual reasoning on data in common image styles. However, their …
High-order interaction learning for image captioning
Y Wang, N Xu, AA Liu, W Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Image captioning aims at understanding various semantic concepts (eg, objects and
relationships) from an image and integrating them in a sentence-level description. Hence, it …
relationships) from an image and integrating them in a sentence-level description. Hence, it …
One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications
The prevalent use of commercial and open-source diffusion models (DMs) for text-to-image
generation prompts risk mitigation to prevent undesired behaviors. Existing concept erasing …
generation prompts risk mitigation to prevent undesired behaviors. Existing concept erasing …
Deep image captioning: A review of methods, trends and future challenges
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …
content of images in human language, which requires to model semantic relationship …
Visuals to text: A comprehensive review on automatic image captioning
Y Ming, N Hu, C Fan, F Feng… - IEEE/CAA Journal of …, 2022 - researchportal.port.ac.uk
Image captioning refers to automatic generation of descriptive texts according to the visual
content of images. It is a technique integrating multiple disciplines including the computer …
content of images. It is a technique integrating multiple disciplines including the computer …