Big data analytic framework for organizational leverage

S Mathrani, X Lai - Applied Sciences, 2021 - mdpi.com
Web data have grown exponentially to reach zettabyte scales. Mountains of data come from
several online applications, such as e-commerce, social media, web and sensor-based …

Information quality assessment for data fusion systems

MA Becerra, C Tobón, AE Castro-Ospina… - Data, 2021 - mdpi.com
This paper provides a comprehensive description of the current literature on data fusion,
with an emphasis on Information Quality (IQ) and performance evaluation. This literature …

A decision-support framework for data anonymization with application to machine learning processes

L Caruccio, D Desiato, G Polese, G Tortora… - Information …, 2022 - Elsevier
The application of machine learning techniques to large and distributed data archives might
result in the disclosure of sensitive information about the data subjects. Data often contain …

GDPR compliant information confidentiality preservation in big data processing

L Caruccio, D Desiato, G Polese, G Tortora - IEEE Access, 2020 - ieeexplore.ieee.org
Nowadays, new laws and regulations, such as the European General Data Protection
Regulation (GDPR), require companies to define privacy policies complying with the …

Self-supervised and interpretable data cleaning with sequence generative adversarial networks

J Peng, D Shen, N Tang, T Liu, Y Kou, T Nie… - Proceedings of the …, 2022 - dl.acm.org
We study the problem of self-supervised and interpretable data cleaning, which
automatically extracts interpretable data repair rules from dirty data. In this paper, we …

Towards the efficient discovery of meaningful functional dependencies

Z Wei, S Link - Information Systems, 2023 - Elsevier
We propose the first framework for discovering the set of meaningful functional
dependencies from data. This set contains the true positives among the set of functional …

IndiBits: Incremental discovery of relaxed functional dependencies using bitwise similarity

B Breve, L Caruccio, S Cirillo… - 2023 IEEE 39th …, 2023 - ieeexplore.ieee.org
One of the main challenges in data profiling is to efficiently extract metadata from dynamic
information sources, by avoiding the processing of the whole dataset from scratch upon …

Dependency visualization in data stream profiling

B Breve, L Caruccio, S Cirillo, V Deufemia, G Polese - Big Data Research, 2021 - Elsevier
Data stream profiling concerns the automatic extraction of metadata from a data stream,
without having the possibility to store it. Among the metadata of interest, functional …

Efficient Differential Dependency Discovery

S Kuang, H Yang, Z Tan, S Ma - Proceedings of the VLDB Endowment, 2024 - dl.acm.org
Differential dependencies (DDs) are proposed to specify constraints on the differences
between values, where the semantics of difference can be" similar"," dissimilar" and beyond …

Efficient Relaxed Functional Dependency Discovery with Minimal Set Cover

X Ding, Y Liu, H Wang, C Wang, Y Song… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Assessing data quality through Functional Depen-dencies (FDs) is a crucial aspect of data
governance. However, with the diverse range of data sources and the exponential growth in …