CLVIN: Complete language-vision interaction network for visual question answering

C Chen, D Han, X Shen - Knowledge-Based Systems, 2023 - Elsevier
The emergence of the Transformer optimizes the interactive modeling of multimodal
information in visual question answering (VQA) tasks, hel** machines better understand …

SGUIE-Net: Semantic attention guided underwater image enhancement with multi-scale perception

Q Qi, K Li, H Zheng, X Gao, G Hou… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Due to the wavelength-dependent light attenuation, refraction and scattering, underwater
images usually suffer from color distortion and blurred details. However, due to the limited …

Region-object relation-aware dense captioning via transformer

Z Shao, J Han, D Marnerides… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Dense captioning provides detailed captions of complex visual scenes. While a number of
successes have been achieved in recent years, there are still two broad limitations: 1) most …

Deep fuzzy hashing network for efficient image retrieval

H Lu, M Zhang, X Xu, Y Li… - IEEE transactions on fuzzy …, 2020 - ieeexplore.ieee.org
Hashing methods for efficient image retrieval aim at learning hash functions that map similar
images to semantically correlated binary codes in the Hamming space with similarity well …

Fashionvlp: Vision language transformer for fashion retrieval with feedback

S Goenka, Z Zheng, A Jaiswal… - Proceedings of the …, 2022 - openaccess.thecvf.com
Fashion image retrieval based on a query pair of reference image and natural language
feedback is a challenging task that requires models to assess fashion related information …

An overview of recent work in media forensics: Methods and threats

K Bhagtani, AKS Yadav, ER Bartusiak, Z **ang… - arxiv preprint arxiv …, 2022 - arxiv.org
In this paper, we review recent work in media forensics for digital images, video, audio
(specifically speech), and documents. For each data modality, we discuss synthesis and …

Towards multi-modal sarcasm detection via hierarchical congruity modeling with knowledge enhancement

H Liu, W Wang, H Li - arxiv preprint arxiv:2210.03501, 2022 - arxiv.org
Sarcasm is a linguistic phenomenon indicating a discrepancy between literal meanings and
implied intentions. Due to its sophisticated nature, it is usually challenging to be detected …

Nearest neighbor-based contrastive learning for hyperspectral and LiDAR data classification

M Wang, F Gao, J Dong, HC Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The joint hyperspectral image (HSI) and light detection and ranging (LiDAR) data
classification aims to interpret ground objects at more detailed and precise level. Although …

Sarcasm driven by sentiment: A sentiment-aware hierarchical fusion network for multimodal sarcasm detection

H Liu, R Wei, G Tu, J Lin, C Liu, D Jiang - Information Fusion, 2024 - Elsevier
Sarcasm is a form of sentiment expression that highlights the disparity between a person's
true intentions and the content they explicitly present. With the exponential increase in …

Cosmo: Content-style modulation for image retrieval with text feedback

S Lee, D Kim, B Han - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
We tackle the task of image retrieval with text feedback, where a reference image and
modifier text are combined to identify the desired target image. We focus on designing an …