- Academic Search

M Dell, J Carlson, T Bryan, E Silcock… - Advances in …, 2024 - proceedings.neurips.cc

Existing full text datasets of US public domain newspapers do not recognize the often
complex layouts of newspaper scans, and as a result the digitized content scrambles texts …

Save Cite Cited by 23 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] oapen.org

[BOOK][B] A world of fiction: Digital collections and the future of literary history

K Bode - 2019 - library.oapen.org

During the 19th century, throughout the Anglophone world, most fiction was first published in
periodicals. In Australia, newspapers were not only the main source of periodical fiction, but …

Save Cite Cited by 172 Related articles All 8 versions Free GPT-4 Library Search View as HTML

[Free GPT-4]

[PDF] oup.com

The visual digital turn: Using neural networks to study historical images

M Wevers, T Smits - Digital Scholarship in the Humanities, 2020 - academic.oup.com

Digital humanities research has focused primarily on the analysis of texts. This emphasis
stems from the availability of technology to study digitized text. Optical character recognition …

Save Cite Cited by 143 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] wordpress.com

The equivalence of “close” and “distant” reading; or, toward a new object for data-rich literary history

K Bode - Modern Language Quarterly, 2017 - read.dukeupress.edu

The approaches to data-rich literary history that dominate academic and public debate—
Franco Moretti's “distant reading” and Matthew Jockers's “macroanalysis”—model literary …

Save Cite Cited by 165 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] core.ac.uk

" Q i-jtb the Raven": Taking Dirty OCR Seriously

R Cordell - Book History, 2017 - muse.jhu.edu

This article argues that scholars must understand mass digitized texts as assemblages of
new editions, subsidiary editions, and impressions of their historical sources, and that these …

Save Cite Cited by 108 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] uzh.ch

Language resources for historical newspapers: the Impresso collection

M Ehrmann, M Romanello, S Clematide, PB Ströbel… - 2020 - zora.uzh.ch

Following decades of massive digitization, an unprecedented amount of historical document
facsimiles can now be retrieved and accessed via cultural heritage online portals. If this …

Save Cite Cited by 47 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] tandfonline.com

The reuse of texts in Finnish newspapers and journals, 1771–1920: A digital humanities perspective

H Salmi, P Paju, H Rantala, A Nivala… - Historical Methods: A …, 2020 - Taylor & Francis

The digital collections of newspapers have given rise to a growing interest in studying them
with computational methods. This article contributes to this discussion by presenting a …

Save Cite Cited by 39 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

The newspaper navigator dataset: extracting and analyzing visual content from 16 million historic newspaper pages in chronicling America

BCG Lee, J Mears, E Jakeway, M Ferriter… - arxiv preprint arxiv …, 2020 - arxiv.org

Chronicling America is a product of the National Digital Newspaper Program, a partnership
between the Library of Congress and the National Endowment for the Humanities to digitize …

Save Cite Cited by 43 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aclanthology.org

Efficient ocr for building a diverse digital history

J Carlson, T Bryan, M Dell - … of the 62nd Annual Meeting of the …, 2024 - aclanthology.org

Many users consult digital archives daily, but the information they can access is
unrepresentative of the diversity of documentary history. The sequence-to-sequence …

Save Cite Cited by 8 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] nber.org

Noise-robust de-duplication at scale

E Silcock, L D'Amico-Wong, J Yang, M Dell - 2022 - nber.org

Identifying near duplicates within large, noisy text corpora has a myriad of applications that
range from de-duplicating training datasets, reducing privacy risk, and evaluating test set …

Save Cite Cited by 14 Related articles All 12 versions Free GPT-4 Library Search

Create alert

Cite

Advanced search

Saved to My library

Computational methods for uncovering reprinted texts in antebellum newspapers

American stories: A large-scale structured text dataset of historical us newspapers

[BOOK][B] A world of fiction: Digital collections and the future of literary history

The visual digital turn: Using neural networks to study historical images

The equivalence of “close” and “distant” reading; or, toward a new object for data-rich literary history

" Q i-jtb the Raven": Taking Dirty OCR Seriously

Language resources for historical newspapers: the Impresso collection

The reuse of texts in Finnish newspapers and journals, 1771–1920: A digital humanities perspective

The newspaper navigator dataset: extracting and analyzing visual content from 16 million historic newspaper pages in chronicling America

Efficient ocr for building a diverse digital history

Noise-robust de-duplication at scale