[HTML][HTML] Data provenance for cloud forensic investigations, security, challenges, solutions and future perspectives: A survey

OI Abiodun, M Alawida, AE Omolara… - Journal of King Saud …, 2022 - Elsevier
It is extremely difficult to track down the original source of sensitive data from a variety of
sources in the cloud during transit and processing. For instance, data provenance, which …

Workflow provenance in the lifecycle of scientific machine learning

R Souza, LG Azevedo, V Lourenço… - Concurrency and …, 2022 - Wiley Online Library
Abstract Machine learning (ML) has already fundamentally changed several businesses.
More recently, it has also been profoundly impacting the computational science and …

[HTML][HTML] Dfanalyzer: runtime dataflow analysis tool for computational science and engineering applications

V Silva, V Campos, T Guedes, J Camata, D de Oliveira… - SoftwareX, 2020 - Elsevier
DfAnalyzer is a tool for monitoring, debugging, and analyzing dataflows generated by
Computational Science and Engineering (CSE) applications. It collects strategic raw data …

Raw data queries during data-intensive parallel workflow execution

V Silva, J Leite, JJ Camata, D De Oliveira… - Future Generation …, 2017 - Elsevier
Computer simulations consume and produce huge amounts of raw data files presented in
different formats, eg, HDF5 in computational fluid dynamics simulations. Users often need to …

Towards optimizing the execution of spark scientific workflows using machine learning‐based parameter tuning

D de Oliveira, F Porto, C Boeres… - Concurrency and …, 2021 - Wiley Online Library
In the last few years, Apache Spark has become a de facto the standard framework for big
data systems on both industry and academy projects. Spark is used to execute compute‐and …

Interactive data exploration of distributed raw files: A systematic map** study

A Alvarez-Ayllon, M Palomo-Duarte, JM Dodero - IEEE Access, 2018 - ieeexplore.ieee.org
When exploring big amounts of data without a clear target, providing an interactive
experience becomes really difficult, since this tentative inspection usually defeats any early …

Data reduction in scientific workflows using provenance monitoring and user steering

R Souza, V Silva, ALGA Coutinho, P Valduriez… - Future Generation …, 2020 - Elsevier
Scientific workflows need to be iteratively, and often interactively, executed for large input
datasets. Reducing data from input datasets is a powerful way to reduce overall execution …

Position Paper on Dataset Engineering to Accelerate Science

EV Brazil, E Soares, LV Real, L Azevedo… - arxiv preprint arxiv …, 2023 - arxiv.org
Data is a critical element in any discovery process. In the last decades, we observed
exponential growth in the volume of available data and the technology to manipulate it …

PresQ: Discovery of Multidimensional Equally-Distributed Dependencies via Quasi-Cliques on Hypergraphs

A Álvarez-Ayllón, M Palomo-Duarte… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Cross-matching data stored on separate files is an everyday activity in the scientific domain.
However, sometimes the relation between attributes may not be obvious. The discovery of …

[PDF][PDF] In situ data steering on sedimentation simulation with provenance data

V Silva, J Camata, D De Oliveira… - … Storage and Analysis, 2016 - researchgate.net
(AMR) are optimal strategies for tackling large-scale simulations. libMesh is an open-source
finite-element library that supports parallel AMR and is used in multiphysics applications. In …