[HTML][HTML] From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation

G Reale-Nosei, E Amador-Domínguez… - Medical Image Analysis, 2024 - Elsevier
Abstract Natural Image Captioning (NIC) is an interdisciplinary research area that lies within
the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several …

Interactive and explainable region-guided radiology report generation

T Tanida, P Müller, G Kaissis… - Proceedings of the …, 2023 - openaccess.thecvf.com
The automatic generation of radiology reports has the potential to assist radiologists in the
time-consuming task of report writing. Existing methods generate the full report from image …

Grit: A generative region-to-text transformer for object understanding

J Wu, J Wang, Z Yang, Z Gan, Z Liu, J Yuan… - European Conference on …, 2024 - Springer
This paper presents a Generative RegIon-to-Text transformer, GRiT, for object
understanding. The spirit of GRiT is to formulate object understanding as< region, text> …

Dual-level representation enhancement on characteristic and context for image-text retrieval

S Yang, Q Li, W Li, X Li, AA Liu - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
Image-text retrieval is a fundamental and vital task in multi-media retrieval and has received
growing attention since it connects heterogeneous data. Previous methods that perform well …

Caption anything: Interactive image description with diverse multimodal controls

T Wang, J Zhang, J Fei, H Zheng, Y Tang, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Controllable image captioning is an emerging multimodal topic that aims to describe the
image with natural language following human purpose, $\textit {eg} $, looking at the …

Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

D Sharma, C Dhiman, D Kumar - Expert Systems with Applications, 2023 - Elsevier
Abstract Automatic Visual Captioning (AVC) generates syntactically and semantically correct
sentences by describing important objects, attributes, and their relationships with each other …

Long dialogue emotion detection based on commonsense knowledge graph guidance

W Nie, Y Bao, Y Zhao, A Liu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Dialogue emotion detection is always challenging due to human subjectivity and the
randomness of dialogue content. In a conversation, the emotion of each person often …

Cof-net: A progressive coarse-to-fine framework for object detection in remote-sensing imagery

C Zhang, KM Lam, Q Wang - IEEE Transactions on Geoscience …, 2023 - ieeexplore.ieee.org
Object detection in remote-sensing images is a crucial task in the fields of Earth observation
and computer vision. Despite impressive progress in modern remote-sensing object …

Deep unsupervised part-whole relational visual saliency

Y Liu, X Dong, D Zhang, S Xu - Neurocomputing, 2024 - Elsevier
Abstract Deep Supervised Salient Object Detection (SSOD) excessively relies on large-
scale annotated pixel-level labels which consume intensive labour acquiring high quality …

Textual context-aware dense captioning with diverse words

Z Shao, J Han, K Debattista… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Dense captioning generates more detailed spoken descriptions for complex visual scenes.
Despite several promising leads, existing methods still have two broad limitations: 1) The …