Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L ** - arxiv preprint arxiv:2402.18041, 2024 - arxiv.org
This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

Clinical text datasets for medical artificial intelligence and large language models—a systematic review

J Wu, X Liu, M Li, W Li, Z Su, S Lin, L Garay, Z Zhang… - NEJM AI, 2024 - ai.nejm.org
Privacy and ethical considerations limit access to large-scale clinical datasets, particularly
clinical text data, which contain extensive and diverse information and serve as the …

[HTML][HTML] A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics

K He, R Mao, Q Lin, Y Ruan, X Lan, M Feng… - Information …, 2025 - Elsevier
The utilization of large language models (LLMs) for Healthcare has generated both
excitement and concern due to their ability to effectively respond to free-text queries with …

Huatuogpt, towards taming language model to be a doctor

H Zhang, J Chen, F Jiang, F Yu, Z Chen, J Li… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we present HuatuoGPT, a large language model (LLM) for medical
consultation. The core recipe of HuatuoGPT is to leverage both\textit {distilled data from …

Pre-trained language models in biomedical domain: A systematic survey

B Wang, Q **e, J Pei, Z Chen, P Tiwari, Z Li… - ACM Computing …, 2023 - dl.acm.org
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …

Cmb: A comprehensive medical benchmark in chinese

X Wang, GH Chen, D Song, Z Zhang, Z Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) provide a possibility to make a great breakthrough in
medicine. The establishment of a standardized medical benchmark becomes a fundamental …

Disc-medllm: Bridging general large language models and real-world medical consultation

Z Bao, W Chen, S **ao, K Ren, J Wu, C Zhong… - arxiv preprint arxiv …, 2023 - arxiv.org
We propose DISC-MedLLM, a comprehensive solution that leverages Large Language
Models (LLMs) to provide accurate and truthful medical response in end-to-end …

BioBART: Pretraining and evaluation of a biomedical generative language model

H Yuan, Z Yuan, R Gan, J Zhang, Y **e… - arxiv preprint arxiv …, 2022 - arxiv.org
Pretrained language models have served as important backbones for natural language
processing. Recently, in-domain pretraining has been shown to benefit various domain …

Foundation model for advancing healthcare: challenges, opportunities and future directions

Y He, F Huang, X Jiang, Y Nie, M Wang… - IEEE Reviews in …, 2024 - ieeexplore.ieee.org
Foundation model, trained on a diverse range of data and adaptable to a myriad of tasks, is
advancing healthcare. It fosters the development of healthcare artificial intelligence (AI) …

Huatuogpt-ii, one-stage training for medical adaption of llms

J Chen, X Wang, K Ji, A Gao, F Jiang, S Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
Adapting a language model into a specific domain, akadomain adaption', is a common
practice when specialized knowledge, eg medicine, is not encapsulated in a general …