- Academic Search

Z Chen, Y Du, J Hu, Y Liu, G Li, X Wan… - … Conference on Medical …, 2022 - Springer

Medical vision-and-language pre-training provides a feasible solution to extract effective
vision-and-language representations from medical images and texts. However, few studies …

保存引用被引用数: 126 関連記事全 4 バージョン

[Free GPT-4]

[PDF] arxiv.org

Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge

Z Chen, G Li, X Wan - Proceedings of the 30th ACM International …, 2022 - dl.acm.org

Medical vision-and-language pre-training (Med-VLP) has received considerable attention
owing to its applicability to extracting generic vision-and-language representations from …

保存引用被引用数: 61 関連記事全 4 バージョン

[Free GPT-4]

[PDF] openreview.net

Due: End-to-end document understanding benchmark

Ł Borchmann, M Pietruszka, T Stanislawek… - Thirty-fifth Conference …, 2021 - openreview.net

Understanding documents with rich layouts plays a vital role in digitization and hyper-
automation but remains a challenging topic in the NLP research community. Additionally, the …

保存引用被引用数: 60 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Automatic related work generation: A meta study

X Li, J Ouyang - ar** medical image-text to a joint space via masked modeling

Z Chen, Y Du, J Hu, Y Liu, G Li, X Wan… - Medical Image Analysis, 2024 - Elsevier

Recently, masked autoencoders have demonstrated their feasibility in extracting effective
image and text features (eg, BERT for natural language processing (NLP) and MAE in …

保存引用被引用数: 9 関連記事全 3 バージョン

[Free GPT-4]

[PDF] arxiv.org

Medical vision language pretraining: A survey

P Shrestha, S Amgain, B Khanal, CA Linte… - arxiv preprint arxiv …, 2023 - arxiv.org

Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to
the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and …

保存引用被引用数: 15 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Unidcp: Unifying multiple medical vision-language tasks via dynamic cross-modal learnable prompts

C Zhan, Y Zhang, Y Lin, G Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-
growing medical diagnostics application. However, most Med-VLP models learn task …

保存引用被引用数: 6 関連記事全 3 バージョン

[Free GPT-4]

[PDF] aclanthology.org

Multimodality for NLP-centered applications: Resources, advances and frontiers

M Garg, S Wazarkar, M Singh… - Proceedings of the …, 2022 - aclanthology.org

With the development of multimodal systems and natural language generation techniques,
the resurgence of multimodal datasets has attracted significant research interests, which …

保存引用被引用数: 20 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] archive.org

OVQA: A clinically generated visual question answering dataset

Y Huang, X Wang, F Liu, G Huang - … of the 45th International ACM SIGIR …, 2022 - dl.acm.org

Medical visual question answering (Med-VQA) is a challenging problem that aims to take a
medical image and a clinical question about the image as input and output a correct answer …

保存引用被引用数: 21 関連記事全 2 バージョン

Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering

Y Liu, B Chen, S Wang, G Lu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Medical visual question answering (Medical VQA) is a critical cross-modal interaction task
that garnered considerable attention in the medical domain. Several existing methods …

保存引用被引用数: 4 関連記事

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Melinda: A multimodal dataset for biomedical experiment method classification

Multi-modal masked autoencoders for medical vision-and-language pre-training

Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge

Due: End-to-end document understanding benchmark

Automatic related work generation: A meta study

Medical vision language pretraining: A survey

Unidcp: Unifying multiple medical vision-language tasks via dynamic cross-modal learnable prompts

Multimodality for NLP-centered applications: Resources, advances and frontiers

OVQA: A clinically generated visual question answering dataset

Deep Fuzzy Multi-Teacher Distillation Network for Medical Visual Question Answering