Why does surprisal from larger transformer-based language models provide a poorer fit to human reading times?

BD Oh, W Schuler - Transactions of the Association for Computational …, 2023 - direct.mit.edu
This work presents a linguistic analysis into why larger Transformer-based pre-trained
language models with more parameters and lower perplexity nonetheless yield surprisal …

Large-scale evidence for logarithmic effects of word predictability on reading time

C Shain, C Meister, T Pimentel, R Cotterell… - Proceedings of the …, 2024 - pnas.org
During real-time language comprehension, our minds rapidly decode complex meanings
from sequences of words. The difficulty of doing so is known to be related to words' …

fMRI reveals language-specific predictive coding during naturalistic sentence comprehension

C Shain, IA Blank, M van Schijndel, W Schuler… - Neuropsychologia, 2020 - Elsevier
Much research in cognitive neuroscience supports prediction as a canonical computation of
cognition across domains. Is such predictive coding implemented by feedback from higher …

Robust effects of working memory demand during naturalistic language comprehension in language-selective cortex

C Shain, IA Blank, E Fedorenko, E Gibson… - Journal of …, 2022 - jneurosci.org
To understand language, we must infer structured meanings from real-time auditory or visual
signals. Researchers have long focused on word-by-word structure building in working …

Comparison of structural parsers and neural language models as surprisal estimators

BD Oh, C Clark, W Schuler - Frontiers in Artificial Intelligence, 2022 - frontiersin.org
Expectation-based theories of sentence processing posit that processing difficulty is
determined by predictability in context. While predictability quantified via surprisal has …

Language model quality correlates with psychometric predictive power in multiple languages

EG Wilcox, CI Meister, R Cotterell… - Proceedings of the …, 2023 - research-collection.ethz.ch
Surprisal theory (Hale, 2001; Levy, 2008) posits that a word's reading time is proportional to
its surprisal (ie, to its negative log probability given the proceeding context). Since we are …

The plausibility of sampling as an algorithmic theory of sentence processing

JL Hoover, M Sonderegger, ST Piantadosi… - Open Mind, 2023 - direct.mit.edu
Abstract Words that are more surprising given context take longer to process. However, no
incremental parsing algorithm has been shown to directly predict this phenomenon. In this …

Incremental language comprehension difficulty predicts activity in the language network but not the multiple demand network

L Wehbe, IA Blank, C Shain, R Futrell, R Levy… - Cerebral …, 2021 - academic.oup.com
What role do domain-general executive functions play in human language comprehension?
To address this question, we examine the relationship between behavioral measures of …

How reliable are standard reading time analyses? Hierarchical bootstrap reveals substantial power over-optimism and scale-dependent Type I error inflation

ZJ Burchill, TF Jaeger - Journal of Memory and Language, 2024 - Elsevier
We investigate the statistical power and Type I error rate of the two most common
approaches to reading time (RT) analyses: assuming normality of residuals and …

A synchronized multimodal neuroimaging dataset for studying brain language processing

S Wang, X Zhang, J Zhang, C Zong - Scientific Data, 2022 - nature.com
We present a synchronized multimodal neuroimaging dataset for studying brain language
processing (SMN4Lang) that contains functional magnetic resonance imaging (fMRI) and …