Quantifying ai psychology: A psychometrics benchmark for large language models
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities,
increasingly adopting roles akin to human-like assistants. The broader integration of LLMs …
increasingly adopting roles akin to human-like assistants. The broader integration of LLMs …
Physician detection of clinical harm in machine translation: Quality estimation aids in reliance and backtranslation identifies critical errors
A major challenge in the practical use of Machine Translation (MT) is that users lack
guidance to make informed decisions about when to rely on outputs. Progress in quality …
guidance to make informed decisions about when to rely on outputs. Progress in quality …
Large language model benchmarks in medical tasks
With the increasing application of large language models (LLMs) in the medical domain,
evaluating these models' performance using benchmark datasets has become crucial. This …
evaluating these models' performance using benchmark datasets has become crucial. This …
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
Diagnosing and managing a patient is a complex, sequential decision making process that
requires physicians to obtain information--such as which tests to perform--and to act upon it …
requires physicians to obtain information--such as which tests to perform--and to act upon it …
Vietmed: A dataset and benchmark for automatic speech recognition of vietnamese in the medical domain
K Le-Duc - arxiv preprint arxiv:2404.05659, 2024 - arxiv.org
Due to privacy restrictions, there's a shortage of publicly available speech recognition
datasets in the medical domain. In this work, we present VietMed-a Vietnamese speech …
datasets in the medical domain. In this work, we present VietMed-a Vietnamese speech …
Combining Language Models For Specialized Domains: A Colorful Approach
D Eitan, M Pirchi, N Glazer, S Meital, G Ayach… - arxiv preprint arxiv …, 2023 - arxiv.org
General purpose language models (LMs) encounter difficulties when processing domain-
specific jargon and terminology, which are frequently utilized in specialized fields such as …
specific jargon and terminology, which are frequently utilized in specialized fields such as …
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
M Sanni, T Abdullahi, DD Kayande, E Ayodele… - arxiv preprint arxiv …, 2025 - arxiv.org
Speech technologies are transforming interactions across various sectors, from healthcare
to call centers and robots, yet their performance on African-accented conversations remains …
to call centers and robots, yet their performance on African-accented conversations remains …
MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations
Medical task-oriented dialogue systems can assist doctors by collecting patient medical
history, aiding in diagnosis, or guiding treatment selection, thereby reducing doctor burnout …
history, aiding in diagnosis, or guiding treatment selection, thereby reducing doctor burnout …
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
Multilingual automatic speech recognition (ASR) in the medical domain serves as a
foundational task for various downstream applications such as speech translation, spoken …
foundational task for various downstream applications such as speech translation, spoken …
[HTML][HTML] Leveraging mobile NER for real-time capture of symptoms, diagnoses, and treatments from clinical dialogues
R Rhouma, C McMahon, D Mcgillivray… - Informatics in Medicine …, 2024 - Elsevier
In the dynamic world of healthcare technology, efficiently and accurately extracting medical
data from physician-patient conversations is vital. This paper presents a new approach in …
data from physician-patient conversations is vital. This paper presents a new approach in …