Analysis and Interpretation of metagenomics data: an approach

GS Navgire, N Goel, G Sawhney, M Sharma… - Biological Procedures …, 2022 - Springer
Advances in next-generation sequencing technologies have accelerated the momentum of
metagenomic studies, which is increasing yearly. The metagenomics field is one of the …

A siren song of open source reproducibility, examples from machine learning

E Raff, AL Farris - Proceedings of the 2023 ACM Conference on …, 2023 - dl.acm.org
As reproducibility becomes a greater concern, conferences have largely converged to a
strategy of asking reviewers to indicate whether code was attached to a submission. This …

[PDF][PDF] Data inventories for the modern age? Using data science to open government data

J Lane, E Gimeno, E Levitskaya… - Harvard Data Science …, 2022 - assets.pubpub.org
This article describes how data science techniques—machine learning and natural
language processing—can be used to open the black box of government data. It then …

A coreset learning reality check

F Lu, E Raff, J Holt - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Subsampling algorithms are a natural approach to reduce data size before fitting models on
massive datasets. In recent years, several works have proposed methods for subsampling …

Challenges in using ML for networking research: How to label if you must

Y Lavinia, R Durairajan, R Rejaie… - … on Network Meets AI & ML, 2020 - dl.acm.org
Leveraging innovations in Machine Learning (ML) research is of great current interest to
researchers across the sciences, including networking research. However, using ML for …

What Do Machine Learning Researchers Mean by" Reproducible"?

E Raff, M Benaroch, S Samtani, AL Farris - arxiv preprint arxiv …, 2024 - arxiv.org
The concern that Artificial Intelligence (AI) and Machine Learning (ML) are entering a"
reproducibility crisis" has spurred significant research in the past few years. Yet with each …

[PDF][PDF] Caliban: Docker-based job manager for reproducible workflows

S Ritchie, A Slone, V Ramasesh - Journal of Open Source Software, 2020 - joss.theoj.org
Caliban is a command line tool that helps researchers launch and track their numerical
experiments in an isolated, reproducible computing environment. It was developed by …

[PDF][PDF] Refactoring Machine Learning

AS Ross, JZ Forde - Workshop on Critiquing and Correcting Trends …, 2018 - asross.github.io
Results in machine learning scholarship are sometimes based on untested, difficultto-read
code that has only been seen by a single researcher. We argue that this is bad, and that …