A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets
K Bayoudh, R Knani, F Hamdaoui, A Mtibaa - The Visual Computer, 2022 - Springer
The research progress in multimodal learning has grown rapidly over the last decade in
several areas, especially in computer vision. The growing potential of multimodal data …
several areas, especially in computer vision. The growing potential of multimodal data …
Visual semantic reasoning for image-text matching
Image-text matching has been a hot research topic bridging the vision and language areas.
It remains challenging because the current representation of image usually lacks global …
It remains challenging because the current representation of image usually lacks global …
Image de-raining transformer
Existing deep learning based de-raining approaches have resorted to the convolutional
architectures. However, the intrinsic limitations of convolution, including local receptive fields …
architectures. However, the intrinsic limitations of convolution, including local receptive fields …
Artistic style transfer with internal-external learning and contrastive learning
Although existing artistic style transfer methods have achieved significant improvement with
deep neural networks, they still suffer from artifacts such as disharmonious colors and …
deep neural networks, they still suffer from artifacts such as disharmonious colors and …
Dynamic graph learning with content-guided spatial-frequency relation reasoning for deepfake detection
With the springing up of face synthesis techniques, it is prominent in need to develop
powerful face forgery detection methods due to security concerns. Some existing methods …
powerful face forgery detection methods due to security concerns. Some existing methods …
Styleformer: Real-time arbitrary style transfer via parametric style composition
In this work, we propose a new feed-forward arbitrary style transfer method, referred to as
StyleFormer, which can simultaneously fulfill fine-grained style diversity and semantic …
StyleFormer, which can simultaneously fulfill fine-grained style diversity and semantic …
Image-text embedding learning via visual and textual semantic reasoning
As a bridge between language and vision domains, cross-modal retrieval between images
and texts is a hot research topic in recent years. It remains challenging because the current …
and texts is a hot research topic in recent years. It remains challenging because the current …
Tsit: A simple and versatile framework for image-to-image translation
We introduce a simple and versatile framework for image-to-image translation. We unearth
the importance of normalization layers, and provide a carefully designed two-stream …
the importance of normalization layers, and provide a carefully designed two-stream …
Dynast: Dynamic sparse transformer for exemplar-guided image generation
One key challenge of exemplar-guided image generation lies in establishing fine-grained
correspondences between input and guided images. Prior approaches, despite the …
correspondences between input and guided images. Prior approaches, despite the …
Rethinking zero-shot learning: A conditional visual classification perspective
Zero-shot learning (ZSL) aims to recognize instances of unseen classes solely based on the
semantic descriptions of the classes. Existing algorithms usually formulate it as a semantic …
semantic descriptions of the classes. Existing algorithms usually formulate it as a semantic …