Opportunities and challenges for ChatGPT and large language models in biomedicine and health

S Tian, Q **, L Yeganova, PT Lai, Q Zhu… - Briefings in …, 2024 - academic.oup.com
ChatGPT has drawn considerable attention from both the general public and domain experts
with its remarkable text generation capabilities. This has subsequently led to the emergence …

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arxiv preprint arxiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

Benchmarking large language models on cmexam-a comprehensive chinese medical exam dataset

J Liu, P Zhou, Y Hua, D Chong, Z Tian… - Advances in …, 2024 - proceedings.neurips.cc
Recent advancements in large language models (LLMs) have transformed the field of
question answering (QA). However, evaluating LLMs in the medical field is challenging due …

Can llms augment low-resource reading comprehension datasets? opportunities and challenges

V Samuel, H Aynaou, AG Chowdhury… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive zero shot performance on a
wide range of NLP tasks, demonstrating the ability to reason and apply commonsense. A …

The dawn after the dark: An empirical study on factuality hallucination in large language models

J Li, J Chen, R Ren, X Cheng, WX Zhao, JY Nie… - arxiv preprint arxiv …, 2024 - arxiv.org
In the era of large language models (LLMs), hallucination (ie, the tendency to generate
factually incorrect content) poses great challenge to trustworthy and reliable deployment of …

[PDF][PDF] Linguistic calibration of longform generations

N Band, X Li, T Ma… - Forty-first …, 2024 - storage.prod.researchhub.com
Abstract Language models (LMs) may lead their users to make suboptimal downstream
decisions when they confidently hallucinate. This issue can be mitigated by having the LM …

Is ChatGPT a biomedical expert?--exploring the zero-shot performance of current GPT models in biomedical tasks

S Ateia, U Kruschwitz - arxiv preprint arxiv:2306.16108, 2023 - arxiv.org
We assessed the performance of commercial Large Language Models (LLMs) GPT-3.5-
Turbo and GPT-4 on tasks from the 2023 BioASQ challenge. In Task 11b Phase B, which is …

Overview of bioasq 2023: The eleventh bioasq challenge on large-scale biomedical semantic indexing and question answering

A Nentidis, G Katsimpras, A Krithara… - … Conference of the Cross …, 2023 - Springer
This is an overview of the eleventh edition of the BioASQ challenge in the context of the
Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of …

Lab-bench: Measuring capabilities of language models for biology research

JM Laurent, JD Janizek, M Ruzo, MM Hinks… - arxiv preprint arxiv …, 2024 - arxiv.org
There is widespread optimism that frontier Large Language Models (LLMs) and LLM-
augmented systems have the potential to rapidly accelerate scientific discovery across …

Halueval-wild: Evaluating hallucinations of language models in the wild

Z Zhu, Y Yang, Z Sun - arxiv preprint arxiv:2403.04307, 2024 - arxiv.org
Hallucinations pose a significant challenge to the reliability of large language models
(LLMs) in critical domains. Recent benchmarks designed to assess LLM hallucinations …