Large-scale multi-modal pre-trained models: A comprehensive survey
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
A deep cross-modal neural cognitive diagnosis framework for modeling student performance
In intelligent education systems, one fundamental task is to predict student performance on
new exercises and estimate the knowledge proficiency of students on knowledge concepts …
new exercises and estimate the knowledge proficiency of students on knowledge concepts …
Cross-modal contrastive learning for domain adaptation in 3d semantic segmentation
Abstract Domain adaptation for 3D point cloud has attracted a lot of interest since it can
avoid the time-consuming labeling process of 3D data to some extent. A recent work named …
avoid the time-consuming labeling process of 3D data to some extent. A recent work named …
Ta-Adapter: Enhancing few-shot CLIP with task-aware encoders
Abstract Contrastive Language-Image Pre-training (CLIP) has shown impressive zero-shot
transfer capabilities, but its potential for specific downstream tasks is not fully utilized. To …
transfer capabilities, but its potential for specific downstream tasks is not fully utilized. To …
Evaluating out-of-distribution performance on document image classifiers
The ability of a document classifier to handle inputs that are drawn from a distribution
different from the training distribution is crucial for robust deployment and generalizability …
different from the training distribution is crucial for robust deployment and generalizability …
F-SCP: An automatic prompt generation method for specific classes based on visual language pre-training models
The zero-shot classification performance of large-scale vision-language pre-training models
(eg, CLIP, BLIP and ALIGN) can be enhanced by incorporating a prompt (eg,“a photo of a …
(eg, CLIP, BLIP and ALIGN) can be enhanced by incorporating a prompt (eg,“a photo of a …
Visually-Rich Document Understanding: Concepts, Taxonomy and Challenges
The increasing prevalence of Visually-rich Documents (VRDs) in diverse domains has led to
a growing interest in Visually-rich Document Understanding (VrDU). Researchers have …
a growing interest in Visually-rich Document Understanding (VrDU). Researchers have …
Multi-schema prompting powered token-feature woven attention network for short text classification
Z Cai, H Zhang, P Zhan, X Jia, Y Yan, X Song, B **e - Pattern Recognition, 2024 - Elsevier
Short text classification task poses challenges in natural language processing due to
insufficient contextual information. This task is typically approached by extracting rich …
insufficient contextual information. This task is typically approached by extracting rich …
Enhancing automatic placenta analysis through distributional feature recomposition in vision-language contrastive learning
The placenta is a valuable organ that can aid in understanding adverse events during
pregnancy and predicting issues post-birth. Manual pathological examination and report …
pregnancy and predicting issues post-birth. Manual pathological examination and report …
Beyond Document Page Classification: Design, Datasets, and Challenges
This paper highlights the need to bring document classification benchmarking closer to real-
world applications, both in the nature of data tested (X: multi-channel, multi-paged, multi …
world applications, both in the nature of data tested (X: multi-channel, multi-paged, multi …