- Academic Search

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Save Cite Cited by 396 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] sciencedirect.com

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Save Cite Cited by 109 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Gpt-4v (ision) is a human-aligned evaluator for text-to-3d generation

T Wu, G Yang, Z Li, K Zhang, Z Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Despite recent advances in text-to-3D generative methods there is a notable absence of
reliable evaluation metrics. Existing metrics usually focus on a single criterion each such as …

Save Cite Cited by 67 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Gpt4point: A unified framework for point-language understanding and generation

Z Qi, Y Fang, Z Sun, X Wu, T Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multimodal Large Language Models (MLLMs) have excelled in 2D image-text
comprehension and image generation but their understanding of the 3D world is notably …

Save Cite Cited by 26 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

The Neglected Tails in Vision-Language Models

S Parashar, Z Lin, T Liu, X Dong, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language models (VLMs) excel in zero-shot recognition but their performance varies
greatly across different visual concepts. For example although CLIP achieves impressive …

Save Cite Cited by 30 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Benchlmm: Benchmarking cross-style visual capability of large multimodal models

R Cai, Z Song, D Guan, Z Chen, Y Li, X Luo… - … on Computer Vision, 2024 - Springer

Abstract Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown
remarkable capabilities in visual reasoning on data in common image styles. However, their …

Save Cite Cited by 33 Related articles All 2 versions Free GPT-4

High-order interaction learning for image captioning

Y Wang, N Xu, AA Liu, W Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Image captioning aims at understanding various semantic concepts (eg, objects and
relationships) from an image and integrating them in a sentence-level description. Hence, it …

Save Cite Cited by 87 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications

M Lyu, Y Yang, H Hong, H Chen, X **… - Proceedings of the …, 2024 - openaccess.thecvf.com

The prevalent use of commercial and open-source diffusion models (DMs) for text-to-image
generation prompts risk mitigation to prevent undesired behaviors. Existing concept erasing …

Save Cite Cited by 33 Related articles All 3 versions Free GPT-4 View as HTML

Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier

Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

Save Cite Cited by 39 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] port.ac.uk

Visuals to text: A comprehensive review on automatic image captioning

Y Ming, N Hu, C Fan, F Feng… - IEEE/CAA Journal of …, 2022 - researchportal.port.ac.uk

Image captioning refers to automatic generation of descriptive texts according to the visual
content of images. It is a technique integrating multiple disciplines including the computer …

Save Cite Cited by 44 Related articles All 6 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Improving image captioning with better use of captions

From show to tell: A survey on deep learning-based image captioning

Multimodal research in vision and language: A review of current and emerging trends

Gpt-4v (ision) is a human-aligned evaluator for text-to-3d generation

Gpt4point: A unified framework for point-language understanding and generation

The Neglected Tails in Vision-Language Models

Benchlmm: Benchmarking cross-style visual capability of large multimodal models

High-order interaction learning for image captioning

One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications

Deep image captioning: A review of methods, trends and future challenges

Visuals to text: A comprehensive review on automatic image captioning