A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs

BP Allen, PT Groth - arxiv preprint arxiv:2502.02896, 2025 - arxiv.org
Evaluating large language models (LLMs) for tasks like fact extraction in support of
knowledge graph construction frequently involves computing accuracy metrics using a …