Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
[HTML][HTML] Partial visual-semantic embedding: Fine-grained outfit image representation with massive volumes of tags via angular-based contrastive learning
A novel technology named fashion intelligence system, which quantifies ambiguous
expressions unique to fashion, such as “casual,”“adult-casual,” and “office-casual,” was …
expressions unique to fashion, such as “casual,”“adult-casual,” and “office-casual,” was …
Debiased momentum contrastive learning for multimodal video similarity measures
The growing potential of multimodal short videos has contributed to a new type of
recommendation. It depends on effectively measuring the similarities between the short …
recommendation. It depends on effectively measuring the similarities between the short …
Revisiting pre-training in audio-visual learning
Pre-training technique has gained tremendous success in enhancing model performance on
various tasks, but found to perform worse than training from scratch in some uni-modal …
various tasks, but found to perform worse than training from scratch in some uni-modal …
Partial Visual-Semantic Embedding: Fashion Intelligence System with Sensitive Part-by-Part Learning
R Shimizu, T Nakamura, M Goto - arxiv preprint arxiv:2211.06688, 2022 - arxiv.org
In this study, we propose a technology called the Fashion Intelligence System based on the
visual-semantic embedding (VSE) model to quantify abstract and complex expressions …
visual-semantic embedding (VSE) model to quantify abstract and complex expressions …
[CITACE][C] Fine-Grained Multimodal Entity Linking For Videos
赵海全, 王续武, **金亮, **直旭, 肖仰华 - Journal of Software, 2023
[CITACE][C] 面向视频的细粒度多模态实体链接
赵海全, 王续武, **金亮, **直旭, 肖仰华 - 软件学报, 2023