50 years of data science

D Donoho - Journal of Computational and Graphical Statistics, 2017‏ - Taylor & Francis
More than 50 years ago, John Tukey called for a reformation of academic statistics. In “The
Future of Data Analysis,” he pointed to the existence of an as-yet unrecognized science …

Reproducibility in scientific computing

P Ivie, D Thain - ACM Computing Surveys (CSUR), 2018‏ - dl.acm.org
Reproducibility is widely considered to be an essential requirement of the scientific process.
However, a number of serious concerns have been raised recently, questioning whether …

[PDF][PDF] Data science at the singularity

D Donoho - Harvard Data Science Review, 2024‏ - assets.pubpub.org
Something fundamental to computation-based research has really changed in the last ten
years. In certain fields, progress is simply dramatically more rapid than previously …

Why linked data is not enough for scientists

S Bechhofer, I Buchan, D De Roure, P Missier… - Future Generation …, 2013‏ - Elsevier
Scientific data represents a significant portion of the linked open data cloud and scientists
stand to benefit from the data fusion capability this will afford. Publishing linked data into the …

Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance

Y Gil, CH David, I Demir, BT Essawy… - Earth and Space …, 2016‏ - Wiley Online Library
Geoscientists now live in a world rich with digital data and methods, and their computational
research cannot be fully captured in traditional publications. The Geoscience Paper of the …

noWorkflow: capturing and analyzing provenance of scripts

L Murta, V Braganholo, F Chirigati, D Koop… - … and Annotation of Data …, 2015‏ - Springer
We propose noWorkflow, a tool that transparently captures provenance of scripts and
enables reproducibility. Unlike existing approaches, noWorkflow is non-intrusive and does …

YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts

T McPhillips, T Song, T Kolisnik, S Aulenbach… - arxiv preprint arxiv …, 2015‏ - arxiv.org
Scientific workflow management systems offer features for composing complex
computational pipelines from modular building blocks, for executing the resulting automated …

[PDF][PDF] 50 years of Data Science

D Donoho - URL http://courses. csail. mit. edu/18, 2015‏ - smallake.kr
More than 50 years ago, John Tukey called for a reformation of academic statistics. In 'The
Future of Data Analysis', he pointed to the existence of an as-yet unrecognized science …

Bridging the chasm: A survey of software engineering practice in scientific programming

T Storer - ACM Computing Surveys (CSUR), 2017‏ - dl.acm.org
The use of software is pervasive in all fields of science. Associated software development
efforts may be very large, long lived, and complex, requiring the commitment of significant …

CODECHECK: an Open Science initiative for the independent execution of computations underlying research articles during peer review to improve reproducibility

D Nüst, SJ Eglen - F1000Research, 2021‏ - pmc.ncbi.nlm.nih.gov
The traditional scientific paper falls short of effectively communicating computational
research. To help improve this situation, we propose a system by which the computational …