Knowledge graphs

A Hogan, E Blomqvist, M Cochez, C d'Amato… - ACM Computing …, 2021 - dl.acm.org
In this article, we provide a comprehensive introduction to knowledge graphs, which have
recently garnered significant attention from both industry and academia in scenarios that …

Revealing the vectors of cellular identity with single-cell genomics

A Wagner, A Regev, N Yosef - Nature biotechnology, 2016 - nature.com
Single-cell genomics has now made it possible to create a comprehensive atlas of human
cells. At the same time, it has reopened definitions of a cell's identity and of the ways in …

Apache spark: a unified engine for big data processing

M Zaharia, RS **n, P Wendell, T Das… - Communications of the …, 2016 - dl.acm.org
Apache Spark: a unified engine for big data processing Page 1 56 COMMUNICATIONS OF THE
ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles DOI:10.1145/2934664 This …

Social big data: Recent achievements and new challenges

G Bello-Orgaz, JJ Jung, D Camacho - Information Fusion, 2016 - Elsevier
Big data has become an important issue for a large number of research areas such as data
mining, machine learning, computational intelligence, information fusion, the semantic Web …

Spark sql: Relational data processing in spark

M Armbrust, RS **n, C Lian, Y Huai, D Liu… - Proceedings of the …, 2015 - dl.acm.org
Spark SQL is a new module in Apache Spark that integrates relational processing with
Spark's functional programming API. Built on our experience with Shark, Spark SQL lets …

Big data analytics on Apache Spark

S Salloum, R Dautov, X Chen, PX Peng… - International Journal of …, 2016 - Springer
Apache Spark has emerged as the de facto framework for big data analytics with its
advanced in-memory programming model and upper-level libraries for scalable machine …

Sparrow: distributed, low latency scheduling

K Ousterhout, P Wendell, M Zaharia… - Proceedings of the twenty …, 2013 - dl.acm.org
Large-scale data analytics frameworks are shifting towards shorter task durations and larger
degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete …

Simba: Efficient in-memory spatial analytics

D **e, F Li, B Yao, G Li, L Zhou, M Guo - Proceedings of the 2016 …, 2016 - dl.acm.org
Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and
high-throughput spatial queries and analytics for numerous applications in location-based …

Efficient coflow scheduling without prior knowledge

M Chowdhury, I Stoica - ACM SIGCOMM Computer Communication …, 2015 - dl.acm.org
Inter-coflow scheduling improves application-level communication performance in data-
parallel clusters. However, existing efficient schedulers require a priori coflow information …

From big data to big data mining: challenges, issues, and opportunities

D Che, M Safran, Z Peng - … conference on database systems for advanced …, 2013 - Springer
While “big data” has become a highlighted buzzword since last year,“big data mining”, ie,
mining from big data, has almost immediately followed up as an emerging, interrelated …