Factcheck-bench: Fine-grained evaluation benchmark for automatic fact-checkers

Y Wang, RG Reddy, ZM Mujahid, A Arora… - arxiv preprint arxiv …, 2023‏ - arxiv.org
The increased use of large language models (LLMs) across a variety of real-world
applications calls for mechanisms to verify the factual accuracy of their outputs. In this work …

DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts

T Braun, M Rothermel, M Rohrbach… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The proliferation of disinformation presents a growing threat to societal trust and democracy,
necessitating robust and scalable Fact-Checking systems. In this work, we present Dynamic …

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

H Iqbal, Y Wang, M Wang, G Georgiev, J Geng… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The increased use of large language models (LLMs) across a variety of real-world
applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs …

Loki: An open-source tool for fact verification

H Li, X Han, H Wang, Y Wang, M Wang, R **ng… - arxiv preprint arxiv …, 2024‏ - arxiv.org
We introduce Loki, an open-source tool designed to address the growing problem of
misinformation. Loki adopts a human-centered approach, striking a balance between the …

LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content

MB Kmainasi, AE Shahroor, M Hasanain… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable success as general-
purpose task solvers across various fields, including NLP, healthcare, finance, and law …

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models

Y Tan, B Zheng, B Zheng, K Cao, H **g, J Wei… - arxiv preprint arxiv …, 2024‏ - arxiv.org
With the rapid advancement of Large Language Models (LLMs), significant safety concerns
have emerged. Fundamentally, the safety of large language models is closely linked to the …

[PDF][PDF] CORE: Robust Factual Precision Scoring with Informative Sub-Claim Identification

ZJJZN Weir, SEMWK Sanders… - arxiv preprint arxiv …, 2024‏ - katesanders9.github.io
Hallucinations—the generation of untrue claims—pose a challenge to the application of
large language models (LLMs)[12] thereby motivating the development of metrics to …

Automatic Fact-Checking with Frame-Semantics

J Devasier, R Mediratta, A Putta, C Li - arxiv preprint arxiv:2501.13288, 2025‏ - arxiv.org
We propose a novel paradigm for automatic fact-checking that leverages frame semantics to
enhance the structured understanding of claims, addressing the challenges posed by …

Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?

Q Hu, Q Long, W Wang - arxiv preprint arxiv:2411.02400, 2024‏ - arxiv.org
Fact-checking pipelines increasingly adopt the Decompose-Then-Verify paradigm, where
texts are broken down into smaller claims for individual verification and subsequently …

Model Selection for HERITAGE-AI: Evaluating LLMs for Contextual Data Analysis of Maryland's Domestic Traffic Ads (1824–1864)

RK Gnanasekaran, L Perine, M Conrad… - … Conference on Big …, 2024‏ - ieeexplore.ieee.org
The HERITAGE-AI (Harnessing Enhanced Research and Instructional Technologies for
Archival Generative Exploration using AI), as part of the IMLS grant initiative, GenAI-4 …