Large language model supply chain: A research agenda
The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …
intelligence, introducing unprecedented capabilities in natural language processing and …
Operationalizing machine learning: An interview study
S Shankar, R Garcia, JM Hellerstein… - ar** and operating ML applications leads to a
variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software …
variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software …
Data science through the looking glass: Analysis of millions of github notebooks and ml. net pipelines
The recent success of machine learning (ML) has led to an explosive growth of systems and
applications built by an ever-growing community of system builders and data science (DS) …
applications built by an ever-growing community of system builders and data science (DS) …
Lima: Fine-grained lineage tracing and reuse in machine learning systems
Machine learning (ML) and data science workflows are inherently exploratory. Data
scientists pose hypotheses, integrate the necessary data, and run ML pipelines of data …
scientists pose hypotheses, integrate the necessary data, and run ML pipelines of data …
Spade: Synthesizing data quality assertions for large language model pipelines
Large language models (LLMs) are being increasingly deployed as part of pipelines that
repeatedly process or generate data of some sort. However, a common barrier to …
repeatedly process or generate data of some sort. However, a common barrier to …
Towards observability for production machine learning pipelines
Software organizations are increasingly incorporating machine learning (ML) into their
product offerings, driving a need for new data management tools. Many of these tools …
product offerings, driving a need for new data management tools. Many of these tools …
Supporting Better Insights of Data Science Pipelines with Fine-grained Provenance
Successful data-driven science requires complex data engineering pipelines to clean,
transform, and alter data in preparation for machine learning, and robust results can only be …
transform, and alter data in preparation for machine learning, and robust results can only be …
Visualising data science workflows to support third-party notebook comprehension: An empirical study
Data science is an exploratory and iterative process that often leads to complex and
unstructured code. This code is usually poorly documented and, consequently, hard to …
unstructured code. This code is usually poorly documented and, consequently, hard to …
" We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning
Organizations rely on machine learning engineers (MLEs) to deploy models and maintain
ML pipelines in production. Due to models' extensive reliance on fresh data, the …
ML pipelines in production. Due to models' extensive reliance on fresh data, the …