[HTML][HTML] From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation
Abstract Natural Image Captioning (NIC) is an interdisciplinary research area that lies within
the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several …
the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several …
Interactive and explainable region-guided radiology report generation
The automatic generation of radiology reports has the potential to assist radiologists in the
time-consuming task of report writing. Existing methods generate the full report from image …
time-consuming task of report writing. Existing methods generate the full report from image …
Grit: A generative region-to-text transformer for object understanding
This paper presents a Generative RegIon-to-Text transformer, GRiT, for object
understanding. The spirit of GRiT is to formulate object understanding as< region, text> …
understanding. The spirit of GRiT is to formulate object understanding as< region, text> …
Dual-level representation enhancement on characteristic and context for image-text retrieval
S Yang, Q Li, W Li, X Li, AA Liu - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
Image-text retrieval is a fundamental and vital task in multi-media retrieval and has received
growing attention since it connects heterogeneous data. Previous methods that perform well …
growing attention since it connects heterogeneous data. Previous methods that perform well …
Caption anything: Interactive image description with diverse multimodal controls
Controllable image captioning is an emerging multimodal topic that aims to describe the
image with natural language following human purpose, $\textit {eg} $, looking at the …
image with natural language following human purpose, $\textit {eg} $, looking at the …
Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey
Abstract Automatic Visual Captioning (AVC) generates syntactically and semantically correct
sentences by describing important objects, attributes, and their relationships with each other …
sentences by describing important objects, attributes, and their relationships with each other …
Long dialogue emotion detection based on commonsense knowledge graph guidance
Dialogue emotion detection is always challenging due to human subjectivity and the
randomness of dialogue content. In a conversation, the emotion of each person often …
randomness of dialogue content. In a conversation, the emotion of each person often …
Cof-net: A progressive coarse-to-fine framework for object detection in remote-sensing imagery
Object detection in remote-sensing images is a crucial task in the fields of Earth observation
and computer vision. Despite impressive progress in modern remote-sensing object …
and computer vision. Despite impressive progress in modern remote-sensing object …
Deep unsupervised part-whole relational visual saliency
Y Liu, X Dong, D Zhang, S Xu - Neurocomputing, 2024 - Elsevier
Abstract Deep Supervised Salient Object Detection (SSOD) excessively relies on large-
scale annotated pixel-level labels which consume intensive labour acquiring high quality …
scale annotated pixel-level labels which consume intensive labour acquiring high quality …
Textual context-aware dense captioning with diverse words
Dense captioning generates more detailed spoken descriptions for complex visual scenes.
Despite several promising leads, existing methods still have two broad limitations: 1) The …
Despite several promising leads, existing methods still have two broad limitations: 1) The …