A toolbox for surfacing health equity harms and biases in large language models

SR Pfohl, H Cole-Lewis, R Sayres, D Neal, M Asiedu… - Nature Medicine, 2024 - nature.com
Large language models (LLMs) hold promise to serve complex health information needs but
also have the potential to introduce harm and exacerbate health disparities. Reliably …

Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences

S Shankar, JD Zamfirescu-Pereira… - Proceedings of the 37th …, 2024 - dl.acm.org
Due to the cumbersome nature of human evaluation and limitations of code-based
evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in …

A survey on employing large language models for text-to-sql tasks

L Shi, Z Tang, N Zhang, X Zhang, Z Yang - arxiv preprint arxiv …, 2024 - arxiv.org
The increasing volume of data in relational databases and the expertise needed for writing
SQL queries pose challenges for users to access and analyze data. Text-to-SQL (Text2SQL) …

" We Need Structured Output": Towards User-centered Constraints on Large Language Model Output

MX Liu, F Liu, AJ Fiannaca, T Koo, L Dixon… - Extended Abstracts of …, 2024 - dl.acm.org
Large language models can produce creative and diverse responses. However, to integrate
them into current developer workflows, it is essential to constrain their outputs to follow …

Designing a dashboard for transparency and control of conversational AI

Y Chen, A Wu, T DePodesta, C Yeh, K Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Conversational LLMs function as black box systems, leaving users guessing about why they
see the output they do. This lack of transparency is potentially problematic, especially given …

Understanding the dataset practitioners behind large language models

C Qian, E Reif, M Kahng - Extended Abstracts of the CHI Conference on …, 2024 - dl.acm.org
As large language models (LLMs) become more advanced and impactful, it is increasingly
important to scrutinize the data that they rely upon and produce. What is it to be a dataset …

[HTML][HTML] Assessing how accurately large language models encode and apply the common European framework of reference for languages

L Benedetto, G Gaudeau, A Caines, P Buttery - Computers and Education …, 2025 - Elsevier
Abstract Large Language Models (LLMs) can have a transformative effect on a variety of
domains, including education, and it is therefore pressing to understand whether these …

Natural language outlines for code: Literate programming in the llm era

K Shi, D Altınbüken, S Anand, M Christodorescu… - arxiv preprint arxiv …, 2024 - arxiv.org
We propose using natural language outlines as a novel modality and interaction surface for
providing AI assistance to developers throughout the software development process. An NL …

Jailbreakhunter: a visual analytics approach for jailbreak prompts discovery from large-scale human-llm conversational datasets

Z **, S Liu, H Li, X Zhao, H Qu - arxiv preprint arxiv:2407.03045, 2024 - arxiv.org
Large Language Models (LLMs) have gained significant attention but also raised concerns
due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards …

LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language Models

M Kahng, I Tenney, M Pushkarna… - … on Visualization and …, 2024 - ieeexplore.ieee.org
Evaluating large language models (LLMs) presents unique challenges. While automatic
side-by-side evaluation, also known as LLM-as-a-judge, has become a promising solution …