A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations

MTR Laskar, S Alqahtani, MS Bari… - Proceedings of the …, 2024 - aclanthology.org
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …

A comprehensive survey on evaluating large language model applications in the medical industry

Y Huang, K Tang, M Chen, B Wang - ar** review of effectiveness, feasibility, and applications
M Casu, S Triscari, S Battiato, L Guarnera… - Appl. Sci, 2024 - mirkocasu.github.io
Mental health disorders are a leading cause of disability worldwide, and there is a global
shortage of mental health professionals. AI chatbots have emerged as a potential solution …

Automated legal consulting in construction procurement using metaheuristically optimized large language models

CY Liu, JS Chou - Automation in Construction, 2025 - Elsevier
This paper introduces a hybrid optimization algorithm, Pilgrimage Walk Optimization-
Differential Evolution (PWO-DE), inspired by Taiwan's cultural traditions, to fine-tune large …

Assessing and enhancing large language models in rare disease question-answering

G Wang, J Ran, R Tang, CY Chang, YN Chuang… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the impressive capabilities of Large Language Models (LLMs) in general medical
domains, questions remain about their performance in diagnosing rare diseases. To answer …

[HTML][HTML] Exploring the effectiveness of instruction tuning in biomedical language processing

O Rohanian, M Nouriborji, S Kouchaki… - Artificial intelligence in …, 2024 - Elsevier
Abstract Large Language Models (LLMs), particularly those similar to ChatGPT, have
significantly influenced the field of Natural Language Processing (NLP). While these models …

Can large language models fix data annotation errors? an empirical study using debatepedia for query-focused text summarization

MTR Laskar, M Rahman, I Jahan… - Findings of the …, 2023 - aclanthology.org
Debatepedia is a publicly available dataset consisting of arguments and counter-arguments
on controversial topics that has been widely used for the single-document query-focused …

Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

XY Fu, MTR Laskar, E Khasanova, C Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide
range of tasks without being explicitly fine-tuned on task-specific datasets. However …