The bigscience roots corpus: A 1.6 tb composite multilingual dataset

H Laurençon, L Saulnier, T Wang… - Advances in …, 2022 - proceedings.neurips.cc
As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …

[Књига][B] Using relative entropy for detection and analysis of periods of diachronic linguistic change

S Degaetano-Ortlieb, E Teich - 2018 - publikationen.sulb.uni-saarland.de
We present a data-driven approach to detect periods of linguistic change and the lexical and
grammatical features contributing to change. We focus on the development of scientific …

Toward an optimal code for communication: The case of scientific English

S Degaetano-Ortlieb, E Teich - Corpus Linguistics and Linguistic …, 2022 - degruyter.com
We present a model of the linguistic development of scientific English from the mid-
seventeenth to the late-nineteenth century, a period that witnessed significant political and …

Linguistic variation and change in 250 years of English scientific writing: A data-driven approach

Y Bizzoni, S Degaetano-Ortlieb… - Frontiers in Artificial …, 2020 - frontiersin.org
We trace the evolution of Scientific English through the Late Modern period to modern time
on the basis of a comprehensive corpus composed of the Transactions and Proceedings of …

The Royal Society Corpus 6.0: Providing 300+ years of scientific writing for humanistic study

S Fischer, J Knappen, K Menzel… - Proceedings of the Twelfth …, 2020 - aclanthology.org
We present a new, extended version of the Royal Society Corpus (RSC), a diachronic
corpus of scientific English now covering 300+ years of scientific writing (1665–1996). The …

An information-theoretic approach to modeling diachronic change in scientific English

S Degaetano-Ortlieb, H Kermes, A Khamis… - From data to evidence in …, 2018 - brill.com
We present an information-theoretic approach to investigate diachronic change in scientific
English. Our main assumption is that over time scientific English has become increasingly …

Introduction: Editorship and the editing of scientific journals, 1750–1950

A Fyfe, A Gielas - Centaurus, 2020 - Wiley Online Library
The editors of scientific journals are key gatekeepers for building careers and
communicating knowledge, but we know far less about them than about scientific authors …

[PDF][PDF] Working together towards an ideal infrastructure for language learner corpora

EW Stemle, A Boyd, M Jansen… - Learner Corpus …, 2019 - researchportal.helsinki.fi
In this article we provide an overview of first-hand experiences and vantage points for best
practices from projects in seven European countries dedicated to learner corpus research …

Tracing syntactic change in the scientific genre: Two Universal Dependency-parsed diachronic corpora of scientific English and German

MP Krielke, L Talamo, M Fawzi… - Proceedings of the …, 2022 - aclanthology.org
We present two comparable diachronic corpora of scientific English and German from the
Late Modern Period (17th c.–19th c.) annotated with Universal Dependencies. We describe …

[PDF][PDF] Information-based modeling of diachronic linguistic change: from typicality to productivity

S Degaetano-Ortlieb, E Teich - … of the 10th SIGHUM Workshop on …, 2016 - aclanthology.org
We present a new approach for modeling diachronic linguistic change in grammatical
usage. We illustrate the approach on English scientific writing in Late Modern English …