A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

K Bayoudh, R Knani, F Hamdaoui, A Mtibaa - The Visual Computer, 2022 - Springer
The research progress in multimodal learning has grown rapidly over the last decade in
several areas, especially in computer vision. The growing potential of multimodal data …

Visual semantic reasoning for image-text matching

K Li, Y Zhang, K Li, Y Li, Y Fu - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Image-text matching has been a hot research topic bridging the vision and language areas.
It remains challenging because the current representation of image usually lacks global …

Image de-raining transformer

J **ao, X Fu, A Liu, F Wu, ZJ Zha - IEEE Transactions on Pattern …, 2022 - ieeexplore.ieee.org
Existing deep learning based de-raining approaches have resorted to the convolutional
architectures. However, the intrinsic limitations of convolution, including local receptive fields …

Artistic style transfer with internal-external learning and contrastive learning

H Chen, Z Wang, H Zhang, Z Zuo, A Li… - Advances in …, 2021 - proceedings.neurips.cc
Although existing artistic style transfer methods have achieved significant improvement with
deep neural networks, they still suffer from artifacts such as disharmonious colors and …

Dynamic graph learning with content-guided spatial-frequency relation reasoning for deepfake detection

Y Wang, K Yu, C Chen, X Hu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
With the springing up of face synthesis techniques, it is prominent in need to develop
powerful face forgery detection methods due to security concerns. Some existing methods …

Styleformer: Real-time arbitrary style transfer via parametric style composition

X Wu, Z Hu, L Sheng, D Xu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
In this work, we propose a new feed-forward arbitrary style transfer method, referred to as
StyleFormer, which can simultaneously fulfill fine-grained style diversity and semantic …

Image-text embedding learning via visual and textual semantic reasoning

K Li, Y Zhang, K Li, Y Li, Y Fu - IEEE transactions on pattern …, 2022 - ieeexplore.ieee.org
As a bridge between language and vision domains, cross-modal retrieval between images
and texts is a hot research topic in recent years. It remains challenging because the current …

Tsit: A simple and versatile framework for image-to-image translation

L Jiang, C Zhang, M Huang, C Liu, J Shi… - Computer Vision–ECCV …, 2020 - Springer
We introduce a simple and versatile framework for image-to-image translation. We unearth
the importance of normalization layers, and provide a carefully designed two-stream …

Dynast: Dynamic sparse transformer for exemplar-guided image generation

S Liu, J Ye, S Ren, X Wang - European Conference on Computer Vision, 2022 - Springer
One key challenge of exemplar-guided image generation lies in establishing fine-grained
correspondences between input and guided images. Prior approaches, despite the …

Rethinking zero-shot learning: A conditional visual classification perspective

K Li, MR Min, Y Fu - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
Zero-shot learning (ZSL) aims to recognize instances of unseen classes solely based on the
semantic descriptions of the classes. Existing algorithms usually formulate it as a semantic …