A comprehensive survey on evaluating large language model applications in the medical industry

Y Huang, K Tang, M Chen, B Wang - arxiv preprint arxiv:2404.15777, 2024 - arxiv.org
Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs)
such as GPT and BERT have evolved significantly, impacting various industries with their …

[HTML][HTML] A comprehensive evaluation of large language models on benchmark biomedical text processing tasks

I Jahan, MTR Laskar, C Peng, JX Huang - Computers in biology and …, 2024 - Elsevier
Abstract Recently, Large Language Models (LLMs) have demonstrated impressive
capability to solve a wide range of tasks. However, despite their success across various …

ChatGPT vs human-authored text: Insights into controllable text summarization and sentence style transfer

D Pu, V Demberg - arxiv preprint arxiv:2306.07799, 2023 - arxiv.org
Large-scale language models, like ChatGPT, have garnered significant media attention and
stunned the public with their remarkable capacity for generating coherent text from short …

Overview of the biolaysumm 2024 shared task on the lay summarization of biomedical research articles

T Goldsack, C Scarton, M Shardlow, C Lin - arxiv preprint arxiv …, 2024 - arxiv.org
This paper presents the setup and results of the second edition of the BioLaySumm shared
task on the Lay Summarisation of Biomedical Research Articles, hosted at the BioNLP …

Factkb: Generalizable factuality evaluation using language models enhanced with factual knowledge

S Feng, V Balachandran, Y Bai, Y Tsvetkov - arxiv preprint arxiv …, 2023 - arxiv.org
Evaluating the factual consistency of automatically generated summaries is essential for the
progress and adoption of reliable summarization systems. Despite recent advances, existing …

[HTML][HTML] Ascle—a Python natural language processing toolkit for medical text generation: development and evaluation study

R Yang, Q Zeng, K You, Y Qiao, L Huang… - Journal of Medical …, 2024 - jmir.org
Background Medical texts present significant domain-specific challenges, and manually
curating these texts is a time-consuming and labor-intensive process. To address this …

MeetingBank: A benchmark dataset for meeting summarization

Y Hu, T Ganter, H Deilamsalehy, F Dernoncourt… - arxiv preprint arxiv …, 2023 - arxiv.org
As the number of recorded meetings increases, it becomes increasingly important to utilize
summarization technology to create useful summaries of these recordings. However, there is …

Retrieval augmentation of large language models for lay language generation

Y Guo, W Qiu, G Leroy, S Wang, T Cohen - Journal of Biomedical …, 2024 - Elsevier
The complex linguistic structures and specialized terminology of expert-authored content
limit the accessibility of biomedical literature to the general public. Automated methods have …

Improving biomedical abstractive summarisation with knowledge aggregation from citation papers

C Tang, S Wang, T Goldsack, C Lin - arxiv preprint arxiv:2310.15684, 2023 - arxiv.org
Abstracts derived from biomedical literature possess distinct domain-specific characteristics,
including specialised writing styles and biomedical terminologies, which necessitate a deep …

Language model as an annotator: Unsupervised context-aware quality phrase generation

Z Zhang, Y Zuo, C Lin, J Wu - Knowledge-Based Systems, 2024 - Elsevier
Phrase mining is a fundamental text mining task that aims to identify quality phrases from
context. Nevertheless, the scarcity of extensive gold labels datasets, demanding substantial …