A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations

MTR Laskar, S Alqahtani, MS Bari… - Proceedings of the …, 2024 - aclanthology.org
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …

Leveraging biomolecule and natural language through multi-modal learning: A survey

Q Pei, L Wu, K Gao, J Zhu, Y Wang, Z Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
The integration of biomolecular modeling with natural language (BL) has emerged as a
promising interdisciplinary area at the intersection of artificial intelligence, chemistry and …

Reading subtext: Evaluating large language models on short story summarization with writers

M Subbiah, S Zhang, LB Chilton… - Transactions of the …, 2024 - direct.mit.edu
Abstract We evaluate recent Large Language Models (LLMs) on the challenging task of
summarizing short stories, which can be lengthy, and include nuanced subtext or scrambled …

A comprehensive survey on evaluating large language model applications in the medical industry

Y Huang, K Tang, M Chen, B Wang - arxiv preprint arxiv:2404.15777, 2024 - arxiv.org
Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs)
such as GPT and BERT have evolved significantly, impacting various industries with their …

Unveiling llm evaluation focused on metrics: Challenges and solutions

T Hu, XH Zhou - arxiv preprint arxiv:2404.09135, 2024 - arxiv.org
Natural Language Processing (NLP) is witnessing a remarkable breakthrough driven by the
success of Large Language Models (LLMs). LLMs have gained significant attention across …

A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports

M Sushil, T Zack, D Mandair, Z Zheng… - Journal of the …, 2024 - academic.oup.com
Objective Although supervised machine learning is popular for information extraction from
clinical notes, creating large annotated datasets requires extensive domain expertise and is …

Bioinformatics and biomedical informatics with ChatGPT: Year one review

J Wang, Z Cheng, Q Yao, L Liu, D Xu… - Quantitative Biology, 2024 - Wiley Online Library
The year 2023 marked a significant surge in the exploration of applying large language
model chatbots, notably Chat Generative Pre‐trained Transformer (ChatGPT), across …

Tiny titans: Can smaller large language models punch above their weight in the real world for meeting summarization?

XY Fu, MTR Laskar, E Khasanova, C Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide
range of tasks without being explicitly fine-tuned on task-specific datasets. However …

Large language models in the clinic: a comprehensive benchmark

F Liu, Z Li, H Zhou, Q Yin, J Yang, X Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
The adoption of large language models (LLMs) to assist clinicians has attracted remarkable
attention. Existing works mainly adopt the close-ended question-answering (QA) task with …

An evaluation of large language models in bioinformatics research

H Yin, Z Gu, F Wang, Y Abuduhaibaier, Y Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) such as ChatGPT have gained considerable interest across
diverse research communities. Their notable ability for text completion and generation has …