Analysis methods in neural language processing: A survey

Y Belinkov, J Glass - … of the Association for Computational Linguistics, 2019 - direct.mit.edu
The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …

Semantic structure in deep learning

E Pavlick - Annual Review of Linguistics, 2022 - annualreviews.org
Deep learning has recently come to dominate computational linguistics, leading to claims of
human-level performance in a range of language processing tasks. Like much previous …

Learning from disagreement: A survey

AN Uma, T Fornaciari, D Hovy, S Paun, B Plank… - Journal of Artificial …, 2021 - jair.org
Abstract Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer
evidence that humans disagree, from objective tasks such as part-of-speech tagging to more …

Cumulative reasoning with large language models

Y Zhang, J Yang, Y Yuan, ACC Yao - arxiv preprint arxiv:2308.04371, 2023 - arxiv.org
While language models are powerful and versatile, they often fail to address highly complex
problems. This is because solving complex problems requires deliberate thinking, which has …

Folio: Natural language reasoning with first-order logic

S Han, H Schoelkopf, Y Zhao, Z Qi, M Riddell… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models (LLMs) have achieved remarkable performance on a variety of
natural language understanding tasks. However, existing benchmarks are inadequate in …

Inherent disagreements in human textual inferences

E Pavlick, T Kwiatkowski - Transactions of the Association for …, 2019 - direct.mit.edu
We analyze human's disagreements about the validity of natural language inferences. We
show that, very often, disagreements are not dismissible as annotation “noise”, but rather …

What will it take to fix benchmarking in natural language understanding?

SR Bowman, GE Dahl - arxiv preprint arxiv:2104.02145, 2021 - arxiv.org
Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and
biased systems score so highly on standard benchmarks that there is little room for …

Stress test evaluation for natural language inference

A Naik, A Ravichander, N Sadeh, C Rose… - arxiv preprint arxiv …, 2018 - arxiv.org
Natural language inference (NLI) is the task of determining if a natural language hypothesis
can be inferred from a given premise in a justifiable manner. NLI was proposed as a …

Automated fact checking: Task formulations, methods and future directions

J Thorne, A Vlachos - arxiv preprint arxiv:1806.07687, 2018 - arxiv.org
The recently increased focus on misinformation has stimulated research in fact checking, the
task of assessing the truthfulness of a claim. Research in automating this task has been …

[PDF][PDF] Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment

M Marelli, L Bentivogli, M Baroni… - Proceedings of the …, 2014 - aclanthology.org
This paper presents the task on the evaluation of Compositional Distributional Semantics
Models on full sentences organized for the first time within SemEval-2014. Participation was …