Inherent trade-offs between diversity and stability in multi-task benchmarks

G Zhang, M Hardt - arxiv preprint arxiv:2405.01719, 2024 - arxiv.org
We examine multi-task benchmarks in machine learning through the lens of social choice
theory. We draw an analogy between benchmarks and electoral systems, where models are …

SemEval-2022 Task 3: PreTENS-Evaluating Neural Networks on Presuppositional Semantic Knowledge

R Zamparelli, S Chowdhury, D Brunato… - Proceedings of the …, 2022 - aclanthology.org
We report the results of the SemEval 2022 Task 3, PreTENS, on evaluation the acceptability
of simple sentences containing constructions whose two arguments are presupposed to be …

MorphNLI: A Stepwise Approach to Natural Language Inference Using Text Morphing

VA Negru, R Vacareanu, C Lemnaru… - arxiv preprint arxiv …, 2025 - arxiv.org
We introduce MorphNLI, a modular step-by-step approach to natural language inference
(NLI). When classifying the premise-hypothesis pairs into {entailment, contradiction, neutral} …

Social commonsense reasoning with structured knowledge in text

D Paul - 2024 - archiv.ub.uni-heidelberg.de
Understanding a social situation requires the ability to reason about the underlying emotions
and behaviour of others. For example, when we read a personal story, we use our prior …

LLM-Cite: Cheap Fact Verification with Attribution via URL Generation

N Joshi, A Taly, D Muppalla - openreview.net
Hallucinations are one of the main issues with Large Language Models (LLMs). This has led
to increased interest in automated ways to verify the factuality of LLMs' responses. Existing …