Open benchmarks for assessment of process monitoring and fault diagnosis techniques: A review and critical analysis

A Melo, MM Câmara, N Clavijo, JC Pinto - Computers & Chemical …, 2022 - Elsevier
The present paper brings together openly available datasets and simulators for testing of
process monitoring and fault diagnosis techniques. Some general characteristics of these …

Mitigating bias in radiology machine learning: 1. Data handling

P Rouzrokh, B Khosravi, S Faghani… - Radiology: Artificial …, 2022 - pubs.rsna.org
Minimizing bias is critical to adoption and implementation of machine learning (ML) in
clinical practice. Systematic mathematical biases produce consistent and reproducible …

Coinsight: Visual storytelling for hierarchical tables with connected insights

G Li, R Li, Y Feng, Y Zhang, Y Luo… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Extracting data insights and generating visual data stories from tabular data are critical parts
of data analysis. However, most existing studies primarily focus on tabular data stored as flat …

SuperNOVA: Design strategies and opportunities for interactive visualization in computational notebooks

ZJ Wang, D Munechika, S Lee, DH Chau - Extended Abstracts of the CHI …, 2024 - dl.acm.org
Computational notebooks, such as Jupyter Notebook, have become data scientists' de facto
programming environments. Many visualization researchers and practitioners have …

Dead or alive: Continuous data profiling for interactive data science

W Epperson, V Gorantla, D Moritz… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Profiling data by plotting distributions and analyzing summary statistics is a critical step
throughout data analysis. Currently, this process is manual and tedious since analysts must …

Can large language models predict data correlations from column names?

I Trummer - Proceedings of the VLDB Endowment, 2023 - dl.acm.org
Recent publications suggest using natural language analysis on database schema
elements to guide tuning and profiling efforts. The underlying hypothesis is that state-of-the …

ydata-profiling: Accelerating data-centric AI with high-quality data

F Clemente, GM Ribeiro, A Quemy, MS Santos… - Neurocomputing, 2023 - Elsevier
Abstract ydata-profiling is an open-source Python package for advanced exploratory data
analysis that enables users to generate data profiling reports in a simple, fast, and efficient …

Advances in exploratory data analysis, visualisation and quality for data centric AI systems

H Patel, S Guttula, RS Mittal, N Manwani… - Proceedings of the 28th …, 2022 - dl.acm.org
It is widely accepted that data preparation is one of the most time-consuming steps of the
machine learning (ML) lifecycle. It is also one of the most important steps, as the quality of …

Datapilot: Utilizing quality and usage information for subset selection during visual data preparation

A Narechania, F Du, AR Sinha, R Rossi… - Proceedings of the …, 2023 - dl.acm.org
Selecting relevant data subsets from large, unfamiliar datasets can be difficult. We address
this challenge by modeling and visualizing two kinds of auxiliary information:(1) quality–the …

Accelerating Lung Disease Diagnosis: The Role of Federated Learning and CNN in Multi-Institutional Collaboration

V **dal, V Kukreja, DP Singh, S Vats… - … on Intelligent Systems …, 2023 - ieeexplore.ieee.org
This research employs federated learning using Convolutional Neural Networks (CNN)
across multi-institutional datasets to classify the severity of lung disease. The project …