Multi-modal machine learning in engineering design: A review and future directions

B Song, R Zhou, F Ahmed - … of Computing and …, 2024 - asmedigitalcollection.asme.org
In the rapidly advancing field of multi-modal machine learning (MMML), the convergence of
multiple data modalities has the potential to reshape various applications. This paper …

A survey on multimodal bidirectional machine learning translation of image and natural language processing

W Nam, B Jang - Expert Systems with Applications, 2024 - Elsevier
Advances in multimodal machine learning help artificial intelligence to resemble human
intellect more closely, which perceives the world from multiple modalities. We surveyed state …

Inversion-based style transfer with diffusion models

Y Zhang, N Huang, F Tang, H Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com
The artistic style within a painting is the means of expression, which includes not only the
painting material, colors, and brushstrokes, but also the high-level attributes, including …

Long-clip: Unlocking the long-text capability of clip

B Zhang, P Zhang, X Dong, Y Zang, J Wang - European Conference on …, 2024 - Springer
Abstract Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-
shot classification, text-image retrieval, and text-image generation by aligning image and …

Iterative prompt learning for unsupervised backlit image enhancement

Z Liang, C Li, S Zhou, R Feng… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We propose a novel unsupervised backlit image enhancement method, abbreviated as CLIP-
LIT, by exploring the potential of Contrastive Language-Image Pre-Training (CLIP) for pixel …

High-resolution image synthesis with latent diffusion models

R Rombach, A Blattmann, D Lorenz… - Proceedings of the …, 2022 - openaccess.thecvf.com
By decomposing the image formation process into a sequential application of denoising
autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image …

Text2live: Text-driven layered image and video editing

O Bar-Tal, D Ofri-Amar, R Fridman, Y Kasten… - European conference on …, 2022 - Springer
We present a method for zero-shot, text-driven editing of natural images and videos. Given
an image or a video and a text prompt, our goal is to edit the appearance of existing objects …

Zero-shot text-guided object generation with dream fields

A Jain, B Mildenhall, JT Barron… - Proceedings of the …, 2022 - openaccess.thecvf.com
We combine neural rendering with multi-modal image and text representations to synthesize
diverse 3D objects solely from natural language descriptions. Our method, Dream Fields …

Avatarclip: Zero-shot text-driven generation and animation of 3d avatars

F Hong, M Zhang, L Pan, Z Cai, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
3D avatar creation plays a crucial role in the digital age. However, the whole production
process is prohibitively time-consuming and labor-intensive. To democratize this technology …

Motionclip: Exposing human motion generation to clip space

G Tevet, B Gordon, A Hertz, AH Bermano… - … on Computer Vision, 2022 - Springer
We introduce MotionCLIP, a 3D human motion auto-encoder featuring a latent embedding
that is disentangled, well behaved, and supports highly semantic textual descriptions …