On data lake architectures and metadata management

P Sawadogo, J Darmont - Journal of Intelligent Information Systems, 2021 - Springer
Over the past two decades, we have witnessed an exponential increase of data production
in the world. So-called big data generally come from transactional systems, and even more …

Data lakes: A survey of functions and systems

R Hai, C Koutras, C Quix… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Data lakes are becoming increasingly prevalent for Big Data management and data
analytics. In contrast to traditional 'schema-on-write'approaches such as data warehouses …

A universal approach for multi-model schema inference

P Koupil, S Hricko, I Holubová - Journal of Big Data, 2022 - Springer
The variety feature of Big Data, represented by multi-model data, has brought a new
dimension of complexity to all aspects of data management. The need to process a set of …

Darwin: A data platform for schema evolution management and data migration

U Störl, M Klettke - 2022 - epub.uni-regensburg.de
During the development of NoSQL-backed software, the database schema evolves
alongside the application code. Especially in agile development, new application releases …

[PDF][PDF] NoSQL Schema Evolution and Data Migration: State-of-the-Art and Opportunities.

U Störl, M Klettke, S Scherzinger - EDBT, 2020 - openproceedings.org
Recent position papers demand more schema flexibility, such as the ability to handle
variational data [3, 42]. Many agile software developers have long since turned towards …

[PDF][PDF] A Systematic Review of Automated Classification for Simple and Complex Query SQL on NoSQL Database.

RA Kadir, ESM Surin, MR Sarker - Computer Systems Science & …, 2024 - researchgate.net
ABSTRACT A data lake (DL), abbreviated as DL, denotes a vast reservoir or repository of
data. It accumulates substantial volumes of data and employs advanced analytics to …

Four generations in data engineering for data science: The past, presence and future of a field of science

M Klettke, U Störl - Datenbank-Spektrum, 2022 - Springer
Data-driven methods and data science are important scientific methods in many research
fields. All data science approaches require professional data engineering components. At …

An approach to extracting topic-guided views from the sources of a data lake

C Diamantini, P Lo Giudice, D Potena, E Storti… - Information Systems …, 2021 - Springer
In the last years, data lakes are emerging as an effective and an efficient support for
information and knowledge extraction from a huge amount of highly heterogeneous and …

Self-adapting data migration in the context of schema evolution in NoSQL databases

A Hillenbrand, U Störl, S Nabiyev, M Klettke - Distributed and Parallel …, 2022 - Springer
When NoSQL database systems are used in an agile software development setting, data
model changes occur frequently and thus, data is routinely stored in different versions. The …

Extracting JSON schemas with tagged unions

S Klessinger, M Klettke, U Störl… - arxiv preprint arxiv …, 2023 - arxiv.org
With data lakes and schema-free NoSQL document stores, extracting a descriptive schema
from JSON data collections is an acute challenge. In this paper, we target the discovery of …