Machine knowledge: Creation and curation of comprehensive knowledge bases

G Weikum, XL Dong, S Razniewski… - … and Trends® in …, 2021 - nowpublishers.com
Equip** machines with comprehensive knowledge of the world's entities and their
relationships has been a longstanding goal of AI. Over the last decade, large-scale …

A systematic review of machine learning techniques for stance detection and its applications

N Alturayeif, H Luqman, M Ahmed - Neural Computing and Applications, 2023 - Springer
Stance detection is an evolving opinion mining research area motivated by the vast increase
in the variety and volume of user-generated content. In this regard, considerable research …

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arxiv preprint arxiv …, 2023 - arxiv.org
Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

Evaluating verifiability in generative search engines

NF Liu, T Zhang, P Liang - arxiv preprint arxiv:2304.09848, 2023 - arxiv.org
Generative search engines directly generate responses to user queries, along with in-line
citations. A prerequisite trait of a trustworthy generative search engine is verifiability, ie …

ERASER: A benchmark to evaluate rationalized NLP models

J DeYoung, S Jain, NF Rajani, E Lehman… - arxiv preprint arxiv …, 2019 - arxiv.org
State-of-the-art models in NLP are now predominantly based on deep neural networks that
are opaque in terms of how they come to make predictions. This limitation has increased …

Fact or fiction: Verifying scientific claims

D Wadden, S Lin, K Lo, LL Wang, M van Zuylen… - arxiv preprint arxiv …, 2020 - arxiv.org
We introduce scientific claim verification, a new task to select abstracts from the research
literature containing evidence that SUPPORTS or REFUTES a given scientific claim, and to …

Evaluating models' local decision boundaries via contrast sets

M Gardner, Y Artzi, V Basmova, J Berant… - arxiv preprint arxiv …, 2020 - arxiv.org
Standard test sets for supervised learning evaluate in-distribution generalization.
Unfortunately, when a dataset has systematic gaps (eg, annotation artifacts), these …

Cline: Contrastive learning with semantic negative examples for natural language understanding

D Wang, N Ding, P Li, HT Zheng - arxiv preprint arxiv:2107.00440, 2021 - arxiv.org
Despite pre-trained language models have proven useful for learning high-quality semantic
representations, these models are still vulnerable to simple perturbations. Recent works …

MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims

I Augenstein, C Lioma, D Wang, LC Lima… - arxiv preprint arxiv …, 2019 - arxiv.org
We contribute the largest publicly available dataset of naturally occurring factual claims for
the purpose of automatic claim verification. It is collected from 26 fact checking websites in …

Identifying the human values behind arguments

J Kiesel, M Alshomary, N Handke, X Cai… - Proceedings of the …, 2022 - aclanthology.org
This paper studies the (often implicit) human values behind natural language arguments,
such as to have freedom of thought or to be broadminded. Values are commonly accepted …