An overview of platforms for big earth observation data management and analysis

VCF Gomes, GR Queiroz, KR Ferreira - Remote Sensing, 2020 - mdpi.com
In recent years, Earth observation (EO) satellites have generated big amounts of geospatial
data that are freely available for society and researchers. This scenario brings challenges for …

[HTML][HTML] A genome sequencing system for universal newborn screening, diagnosis, and precision medicine for severe genetic diseases

SF Kingsmore, LD Smith, CM Kunard… - The American Journal of …, 2022 - cell.com
Newborn screening (NBS) dramatically improves outcomes in severe childhood disorders
by treatment before symptom onset. In many genetic diseases, however, outcomes remain …

anndata: Annotated data

I Virshup, S Rybakov, FJ Theis, P Angerer, FA Wolf - BioRxiv, 2021 - biorxiv.org
Summary anndata is a Python package for handling annotated data matrices in memory and
on disk (github. com/theislab/anndata), positioned between pandas and xarray. anndata …

[LIVRE][B] Spatial data science: With applications in R

E Pebesma, R Bivand - 2023 - taylorfrancis.com
Spatial Data Science introduces fundamental aspects of spatial data that every data scientist
should know before they start working with spatial data. These aspects include how …

[PDF][PDF] anndata: Access and store annotated data matrices

I Virshup, S Rybakov, FJ Theis, P Angerer… - Journal of Open Source …, 2024 - joss.theoj.org
Generating insight from high-dimensional data matrices typically works through training
models that annotate observations and variables via low-dimensional representations. In …

Data management in machine learning: Challenges, techniques, and systems

A Kumar, M Boehm, J Yang - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …

Cloud-based storage and computing for remote sensing big data: a technical review

C Xu, X Du, X Fan, G Giuliani, Z Hu… - … Journal of Digital …, 2022 - Taylor & Francis
The rapid growth of remote sensing big data (RSBD) has attracted considerable attention
from both academia and industry. Despite the progress of computer technologies …

SeqArray—a storage-efficient high-performance data format for WGS variant calls

X Zheng, SM Gogarten, M Lawrence, A Stilp… - …, 2017 - academic.oup.com
Motivation Whole-genome sequencing (WGS) data are being generated at an
unprecedented rate. Analysis of WGS data requires a flexible data format to store the …

Vstore: A data store for analytics on large videos

T Xu, LM Botelho, FX Lin - … of the Fourteenth EuroSys Conference 2019, 2019 - dl.acm.org
We present VStore, a data store for supporting fast, resource-efficient analytics over large
archival videos. VStore manages video ingestion, storage, retrieval, and consumption. It …

Lethe: A tunable delete-aware LSM engine

S Sarkar, TI Papon, D Staratzis… - Proceedings of the 2020 …, 2020 - dl.acm.org
Data-intensive applications fueled the evolution of log structured merge (LSM) based key-
value engines that employ the out-of-place paradigm to support high ingestion rates with low …