American stories: A large-scale structured text dataset of historical us newspapers

M Dell, J Carlson, T Bryan, E Silcock… - Advances in …, 2024 - proceedings.neurips.cc
Existing full text datasets of US public domain newspapers do not recognize the often
complex layouts of newspaper scans, and as a result the digitized content scrambles texts …

[BOOK][B] A world of fiction: Digital collections and the future of literary history

K Bode - 2019 - library.oapen.org
During the 19th century, throughout the Anglophone world, most fiction was first published in
periodicals. In Australia, newspapers were not only the main source of periodical fiction, but …

The visual digital turn: Using neural networks to study historical images

M Wevers, T Smits - Digital Scholarship in the Humanities, 2020 - academic.oup.com
Digital humanities research has focused primarily on the analysis of texts. This emphasis
stems from the availability of technology to study digitized text. Optical character recognition …

The equivalence of “close” and “distant” reading; or, toward a new object for data-rich literary history

K Bode - Modern Language Quarterly, 2017 - read.dukeupress.edu
The approaches to data-rich literary history that dominate academic and public debate—
Franco Moretti's “distant reading” and Matthew Jockers's “macroanalysis”—model literary …

" Q i-jtb the Raven": Taking Dirty OCR Seriously

R Cordell - Book History, 2017 - muse.jhu.edu
This article argues that scholars must understand mass digitized texts as assemblages of
new editions, subsidiary editions, and impressions of their historical sources, and that these …

Language resources for historical newspapers: the Impresso collection

M Ehrmann, M Romanello, S Clematide, PB Ströbel… - 2020 - zora.uzh.ch
Following decades of massive digitization, an unprecedented amount of historical document
facsimiles can now be retrieved and accessed via cultural heritage online portals. If this …

The reuse of texts in Finnish newspapers and journals, 1771–1920: A digital humanities perspective

H Salmi, P Paju, H Rantala, A Nivala… - Historical Methods: A …, 2020 - Taylor & Francis
The digital collections of newspapers have given rise to a growing interest in studying them
with computational methods. This article contributes to this discussion by presenting a …

The newspaper navigator dataset: extracting and analyzing visual content from 16 million historic newspaper pages in chronicling America

BCG Lee, J Mears, E Jakeway, M Ferriter… - arxiv preprint arxiv …, 2020 - arxiv.org
Chronicling America is a product of the National Digital Newspaper Program, a partnership
between the Library of Congress and the National Endowment for the Humanities to digitize …

Efficient ocr for building a diverse digital history

J Carlson, T Bryan, M Dell - … of the 62nd Annual Meeting of the …, 2024 - aclanthology.org
Many users consult digital archives daily, but the information they can access is
unrepresentative of the diversity of documentary history. The sequence-to-sequence …

Noise-robust de-duplication at scale

E Silcock, L D'Amico-Wong, J Yang, M Dell - 2022 - nber.org
Identifying near duplicates within large, noisy text corpora has a myriad of applications that
range from de-duplicating training datasets, reducing privacy risk, and evaluating test set …