American stories: A large-scale structured text dataset of historical us newspapers
Existing full text datasets of US public domain newspapers do not recognize the often
complex layouts of newspaper scans, and as a result the digitized content scrambles texts …
complex layouts of newspaper scans, and as a result the digitized content scrambles texts …
[BOOK][B] A world of fiction: Digital collections and the future of literary history
K Bode - 2019 - library.oapen.org
During the 19th century, throughout the Anglophone world, most fiction was first published in
periodicals. In Australia, newspapers were not only the main source of periodical fiction, but …
periodicals. In Australia, newspapers were not only the main source of periodical fiction, but …
The visual digital turn: Using neural networks to study historical images
Digital humanities research has focused primarily on the analysis of texts. This emphasis
stems from the availability of technology to study digitized text. Optical character recognition …
stems from the availability of technology to study digitized text. Optical character recognition …
The equivalence of “close” and “distant” reading; or, toward a new object for data-rich literary history
K Bode - Modern Language Quarterly, 2017 - read.dukeupress.edu
The approaches to data-rich literary history that dominate academic and public debate—
Franco Moretti's “distant reading” and Matthew Jockers's “macroanalysis”—model literary …
Franco Moretti's “distant reading” and Matthew Jockers's “macroanalysis”—model literary …
" Q i-jtb the Raven": Taking Dirty OCR Seriously
R Cordell - Book History, 2017 - muse.jhu.edu
This article argues that scholars must understand mass digitized texts as assemblages of
new editions, subsidiary editions, and impressions of their historical sources, and that these …
new editions, subsidiary editions, and impressions of their historical sources, and that these …
Language resources for historical newspapers: the Impresso collection
Following decades of massive digitization, an unprecedented amount of historical document
facsimiles can now be retrieved and accessed via cultural heritage online portals. If this …
facsimiles can now be retrieved and accessed via cultural heritage online portals. If this …
The reuse of texts in Finnish newspapers and journals, 1771–1920: A digital humanities perspective
The digital collections of newspapers have given rise to a growing interest in studying them
with computational methods. This article contributes to this discussion by presenting a …
with computational methods. This article contributes to this discussion by presenting a …
The newspaper navigator dataset: extracting and analyzing visual content from 16 million historic newspaper pages in chronicling America
BCG Lee, J Mears, E Jakeway, M Ferriter… - arxiv preprint arxiv …, 2020 - arxiv.org
Chronicling America is a product of the National Digital Newspaper Program, a partnership
between the Library of Congress and the National Endowment for the Humanities to digitize …
between the Library of Congress and the National Endowment for the Humanities to digitize …
Efficient ocr for building a diverse digital history
Many users consult digital archives daily, but the information they can access is
unrepresentative of the diversity of documentary history. The sequence-to-sequence …
unrepresentative of the diversity of documentary history. The sequence-to-sequence …
Noise-robust de-duplication at scale
Identifying near duplicates within large, noisy text corpora has a myriad of applications that
range from de-duplicating training datasets, reducing privacy risk, and evaluating test set …
range from de-duplicating training datasets, reducing privacy risk, and evaluating test set …