A survey of open source tools for machine learning with big data in the Hadoop ecosystem

S Landset, TM Khoshgoftaar, AN Richter, T Hasanin - Journal of Big Data, 2015 - Springer
With an ever-increasing amount of options, the task of selecting machine learning tools for
big data can be difficult. The available tools have advantages and drawbacks, and many …

[HTML][HTML] Applications of big data to smart cities

E Al Nuaimi, H Al Neyadi, N Mohamed… - Journal of Internet …, 2015 - Springer
Many governments are considering adopting the smart city concept in their cities and
implementing big data applications that support smart city components to reach the required …

Social big data: Recent achievements and new challenges

G Bello-Orgaz, JJ Jung, D Camacho - Information Fusion, 2016 - Elsevier
Big data has become an important issue for a large number of research areas such as data
mining, machine learning, computational intelligence, information fusion, the semantic Web …

Spark sql: Relational data processing in spark

M Armbrust, RS **n, C Lian, Y Huai, D Liu… - Proceedings of the …, 2015 - dl.acm.org
Spark SQL is a new module in Apache Spark that integrates relational processing with
Spark's functional programming API. Built on our experience with Shark, Spark SQL lets …

[PDF][PDF] Apache flink: Stream and batch processing in a single engine

P Carbone, A Katsifodimos, S Ewen, V Markl… - The Bulletin of the …, 2015 - diva-portal.org
Apache Flink 1 is an open-source system for processing streaming and batch data. Flink is
built on the philosophy that many classes of data processing applications, including real …

The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing

T Akidau, R Bradshaw, C Chambers… - Proceedings of the …, 2015 - dl.acm.org
Unbounded, unordered, global-scale datasets are increasingly common in day-to-day
business (eg Web logs, mobile usage statistics, and sensor networks). At the same time …

State management in Apache Flink®: consistent stateful distributed stream processing

P Carbone, S Ewen, G Fóra, S Haridi… - Proceedings of the …, 2017 - dl.acm.org
Stream processors are emerging in industry as an apparatus that drives analytical but also
mission critical services handling the core of persistent application logic. Thus, apart from …

Systemml: Declarative machine learning on spark

M Boehm, MW Dusenberry, D Eriksson… - Proceedings of the …, 2016 - dl.acm.org
The rising need for custom machine learning (ML) algorithms and the growing data sizes
that require the exploitation of distributed, data-parallel frameworks such as MapReduce or …

The big data system, components, tools, and technologies: a survey

TR Rao, P Mitra, R Bhatt, A Goswami - Knowledge and Information …, 2019 - Springer
The traditional databases are not capable of handling unstructured data and high volumes
of real-time datasets. Diverse datasets are unstructured lead to big data, and it is laborious …

A survey on IoT big data analytic systems: Current and future

Y Sasaki - IEEE Internet of Things Journal, 2021 - ieeexplore.ieee.org
The Internet of Things (IoT) has become widespread around the world. Since a large
number of diverse devices, such as vehicles, household electrical appliances, smart …