Quantifying ai psychology: A psychometrics benchmark for large language models

Y Li, Y Huang, H Wang, X Zhang, J Zou… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities,
increasingly adopting roles akin to human-like assistants. The broader integration of LLMs …

Physician detection of clinical harm in machine translation: Quality estimation aids in reliance and backtranslation identifies critical errors

N Mehandru, S Agrawal, Y **ao, EC Khoong… - arxiv preprint arxiv …, 2023 - arxiv.org
A major challenge in the practical use of Machine Translation (MT) is that users lack
guidance to make informed decisions about when to rely on outputs. Progress in quality …

Large language model benchmarks in medical tasks

LKQ Yan, Q Niu, M Li, Y Zhang, CH Yin, C Fei… - arxiv preprint arxiv …, 2024 - arxiv.org
With the increasing application of large language models (LLMs) in the medical domain,
evaluating these models' performance using benchmark datasets has become crucial. This …

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

S Schmidgall, R Ziaei, C Harris, E Reis… - arxiv preprint arxiv …, 2024 - arxiv.org
Diagnosing and managing a patient is a complex, sequential decision making process that
requires physicians to obtain information--such as which tests to perform--and to act upon it …

Vietmed: A dataset and benchmark for automatic speech recognition of vietnamese in the medical domain

K Le-Duc - arxiv preprint arxiv:2404.05659, 2024 - arxiv.org
Due to privacy restrictions, there's a shortage of publicly available speech recognition
datasets in the medical domain. In this work, we present VietMed-a Vietnamese speech …

Combining Language Models For Specialized Domains: A Colorful Approach

D Eitan, M Pirchi, N Glazer, S Meital, G Ayach… - arxiv preprint arxiv …, 2023 - arxiv.org
General purpose language models (LMs) encounter difficulties when processing domain-
specific jargon and terminology, which are frequently utilized in specialized fields such as …

Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond

M Sanni, T Abdullahi, DD Kayande, E Ayodele… - arxiv preprint arxiv …, 2025 - arxiv.org
Speech technologies are transforming interactions across various sectors, from healthcare
to call centers and robots, yet their performance on African-accented conversations remains …

MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations

VV Saley, G Saha, RJ Das, D Raghu - arxiv preprint arxiv:2410.14204, 2024 - arxiv.org
Medical task-oriented dialogue systems can assist doctors by collecting patient medical
history, aiding in diagnosis, or guiding treatment selection, thereby reducing doctor burnout …

MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder

K Le-Duc, P Phan, TH Pham, BP Tat, MH Ngo… - arxiv preprint arxiv …, 2024 - arxiv.org
Multilingual automatic speech recognition (ASR) in the medical domain serves as a
foundational task for various downstream applications such as speech translation, spoken …

[HTML][HTML] Leveraging mobile NER for real-time capture of symptoms, diagnoses, and treatments from clinical dialogues

R Rhouma, C McMahon, D Mcgillivray… - Informatics in Medicine …, 2024 - Elsevier
In the dynamic world of healthcare technology, efficiently and accurately extracting medical
data from physician-patient conversations is vital. This paper presents a new approach in …