G-Hadoop: MapReduce across distributed data centers for data-intensive computing

L Wang, J Tao, R Ranjan, H Marten, A Streit… - Future Generation …, 2013 - Elsevier
Recently, the computational requirements for large-scale data-intensive analysis of scientific
data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron …

[BOOK][B] Essentials of cloud computing

K Chandrasekaran - 2014 - books.google.com
Cloud computing—accessing computing resources over the Internet—is rapidly changing
the landscape of information technology. Its primary benefits compared to on-premise …

An overview of the open science data cloud

RL Grossman, Y Gu, J Mambretti, M Sabala… - Proceedings of the 19th …, 2010 - dl.acm.org
The Open Science Data Cloud is a distributed cloud based infrastructure for managing,
analyzing, archiving and sharing scientific datasets. We introduce the Open Science Data …

Data-intensive cloud computing: requirements, expectations, challenges, and solutions

J Shamsi, MA Khojaye, MA Qasmi - Journal of grid computing, 2013 - Springer
Data-intensive systems encompass terabytes to petabytes of data. Such systems require
massive storage and intensive computational power in order to execute complex queries …

An improved partitioning mechanism for optimizing massive data analysis using MapReduce

K Slagter, CH Hsu, YC Chung, D Zhang - The Journal of Supercomputing, 2013 - Springer
In the era of Big Data, huge amounts of structured and unstructured data are being produced
daily by a myriad of ubiquitous sources. Big Data is difficult to work with and requires …

Data and task parallelism in ILP using MapReduce

A Srinivasan, TA Faruquie, S Joshi - Machine learning, 2012 - Springer
Nearly two decades of research in the area of Inductive Logic Programming (ILP) have seen
steady progress in clarifying its theoretical foundations and regular demonstrations of its …

[PDF][PDF] Data-Intensive Computing on Grid Computing Environment

P Raina, H Shah - International Journal of Open Publication and … - researchgate.net
Grid computing raises challenging issues in many areas of computer science,
bioinformatics, high energy physics and especially in the area of distributed computing, as …

An adaptive and memory efficient sampling mechanism for partitioning in MapReduce

K Slagter, CH Hsu, YC Chung - International Journal of Parallel …, 2015 - Springer
Big Data refers to the massive amounts of structured and unstructured data being produced
every day from a wide range of sources. Big Data is difficult to work with and needs a large …

Security in data intensive computing systems

EB Fernandez - Handbook of Data Intensive Computing, 2011 - Springer
Many applications, eg, scientific computing, weather prediction, medical image processing,
require the manipulation of large amounts of data. Analysis of web traffic, sales, travel, and …

Understanding scientific applications for cloud environments

S Jha, DS Katz, A Luckow, A Merzky… - Cloud computing …, 2011 - Wiley Online Library
Distributed systems and their specific incarnations have evolved significantly over the years.
Most often, these evolutionary steps have been a consequence of external technology …