Interactive and visual prompt engineering for ad-hoc task adaptation with large language models

H Strobelt, A Webson, V Sanh, B Hoover… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
State-of-the-art neural language models can now be used to solve ad-hoc language tasks
through zero-shot prompting without the need for supervised training. This approach has …

Visual comparison of language model adaptation

R Sevastjanova, E Cakmak, S Ravfogel… - … on Visualization and …, 2022 - ieeexplore.ieee.org
Neural language models are widely used; however, their model parameters often need to be
adapted to the specific domains and tasks of an application, which is time-and resource …

Knowledgevis: Interpreting language models by comparing fill-in-the-blank prompts

A Coscia, A Endert - IEEE Transactions on Visualization and …, 2023 - ieeexplore.ieee.org
Recent growth in the popularity of large language models has led to their increased usage
for summarizing, predicting, and generating text, making it vital to help researchers and …

Mediators: Conversational agents explaining nlp model behavior

N Feldhus, AM Ravichandran, S Möller - arxiv preprint arxiv:2206.06029, 2022 - arxiv.org
The human-centric explainable artificial intelligence (HCXAI) community has raised the
need for framing the explanation process as a conversation between human and machine …

Llm comparator: Visual analytics for side-by-side evaluation of large language models

M Kahng, I Tenney, M Pushkarna, MX Liu… - Extended Abstracts of …, 2024 - dl.acm.org
Automatic side-by-side evaluation has emerged as a promising approach to evaluating the
quality of responses from large language models (LLMs). However, analyzing the results …

XAINES: Explaining AI with narratives

M Hartmann, H Du, N Feldhus, I Kruijff-Korbayová… - KI-Künstliche …, 2022 - Springer
Artificial Intelligence (AI) systems are increasingly pervasive: Internet of Things, in-car
intelligent devices, robots, and virtual assistants, and their large-scale adoption makes it …

LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language Models

M Kahng, I Tenney, M Pushkarna… - … on Visualization and …, 2024 - ieeexplore.ieee.org
Evaluating large language models (LLMs) presents unique challenges. While automatic
side-by-side evaluation, also known as LLM-as-a-judge, has become a promising solution …

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

M Boubdir, E Kim, B Ermis, M Fadaee… - arxiv preprint arxiv …, 2023 - arxiv.org
Human evaluation is increasingly critical for assessing large language models, capturing
linguistic nuances, and reflecting user preferences more accurately than traditional …

Interactive prompt debugging with sequence salience

I Tenney, R Mullins, B Du, S Pandya, M Kahng… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Sequence Salience, a visual tool for interactive prompt debugging with input
salience methods. Sequence Salience builds on widely used salience methods for text …

Visual Analytics for Generative Transformer Models

R Li, R Yang, W **ao, A AbuRaed, G Murray… - arxiv preprint arxiv …, 2023 - arxiv.org
While transformer-based models have achieved state-of-the-art results in a variety of
classification and generation tasks, their black-box nature makes them challenging for …