Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - ACM Transactions on Software …, 2024 - dl.acm.org
The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …

Operationalizing machine learning: An interview study

S Shankar, R Garcia, JM Hellerstein… - ar** and operating ML applications leads to a
variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software …

Data science through the looking glass: Analysis of millions of github notebooks and ml. net pipelines

F Psallidas, Y Zhu, B Karlas, J Henkel… - ACM SIGMOD …, 2022 - dl.acm.org
The recent success of machine learning (ML) has led to an explosive growth of systems and
applications built by an ever-growing community of system builders and data science (DS) …

Lima: Fine-grained lineage tracing and reuse in machine learning systems

A Phani, B Rath, M Boehm - … of the 2021 International Conference on …, 2021 - dl.acm.org
Machine learning (ML) and data science workflows are inherently exploratory. Data
scientists pose hypotheses, integrate the necessary data, and run ML pipelines of data …

Spade: Synthesizing data quality assertions for large language model pipelines

S Shankar, H Li, P Asawa, M Hulsebos, Y Lin… - Proceedings of the …, 2024 - dl.acm.org
Large language models (LLMs) are being increasingly deployed as part of pipelines that
repeatedly process or generate data of some sort. However, a common barrier to …

Towards observability for production machine learning pipelines

S Shankar, A Parameswaran - arxiv preprint arxiv:2108.13557, 2021 - arxiv.org
Software organizations are increasingly incorporating machine learning (ML) into their
product offerings, driving a need for new data management tools. Many of these tools …

Supporting Better Insights of Data Science Pipelines with Fine-grained Provenance

A Chapman, L Lauro, P Missier, R Torlone - ACM Transactions on …, 2024 - dl.acm.org
Successful data-driven science requires complex data engineering pipelines to clean,
transform, and alter data in preparation for machine learning, and robust results can only be …

Visualising data science workflows to support third-party notebook comprehension: An empirical study

D Ramasamy, C Sarasua, A Bacchelli… - Empirical Software …, 2023 - Springer
Data science is an exploratory and iterative process that often leads to complex and
unstructured code. This code is usually poorly documented and, consequently, hard to …

" We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning

S Shankar, R Garcia, JM Hellerstein… - Proceedings of the …, 2024 - dl.acm.org
Organizations rely on machine learning engineers (MLEs) to deploy models and maintain
ML pipelines in production. Due to models' extensive reliance on fresh data, the …