Agent-as-a-judge: Evaluate agents with agents

M Zhuge, C Zhao, D Ashley, W Wang… - ar** web-based GIS applications, commonly known as CyberGIS dashboards, for
querying and visualizing GIS data in environmental research often demands repetitive and …

Agent-as-a-Judge: Evaluating Agents with Agents

M Zhuge, C Zhao, DR Ashley, W Wang, D Khizbullin… - openreview.net
Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes---ignoring the step-by-step nature of …