[PDF][PDF] Rankreduce–processing k-nearest neighbor queries on top of mapreduce
We consider the problem of processing K-Nearest Neighbor (KNN) queries over large data
sets where the index is jointly maintained by a set of machines in a computing cluster. The …
sets where the index is jointly maintained by a set of machines in a computing cluster. The …
Max-cover in map-reduce
The NP-hard Max-k-cover problem requires selecting k sets from a collection so as to
maximize the size of the union. This classic problem occurs commonly in many settings in …
maximize the size of the union. This classic problem occurs commonly in many settings in …
MapReduce indexing strategies: Studying scalability and efficiency
In Information Retrieval (IR), the efficient indexing of terabyte-scale and larger corpora is still
a difficult problem. MapReduce has been proposed as a framework for distributing data …
a difficult problem. MapReduce has been proposed as a framework for distributing data …
An efficient data mining framework on Hadoop using Java persistence API
Y Lai, S ZhongZhi - 2010 10th IEEE International Conference …, 2010 - ieeexplore.ieee.org
Data indexing is common in data mining when working with high-dimensional, large-scale
data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has …
data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has …
[PDF][PDF] University of Glasgow at TREC 2009: Experiments with Terrier.
In TREC 2009, we extend our Voting Model for the faceted blog distillation, top stories
identification, and related entity finding tasks. Moreover, we experiment with our novel …
identification, and related entity finding tasks. Moreover, we experiment with our novel …
DH-TRIE frequent pattern mining on Hadoop using JPA
L Yang, Z Shi, LD Xu, F Liang… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
The FPgrowth is a famous frequent pattern's algorithm in data mining when working with
high-dimensional, large-scale data sets. It is also known as great complexity on memory for …
high-dimensional, large-scale data sets. It is also known as great complexity on memory for …
BSP cost and scalability analysis for MapReduce operations
Data abundance poses the need for powerful and easy‐to‐use tools that support processing
large amounts of data. MapReduce has been increasingly adopted for over a decade by …
large amounts of data. MapReduce has been increasingly adopted for over a decade by …
[PDF][PDF] A new data mining algorithm based on MapReduce and Hadoop
H **nxiang, X Henan - Int. J. Signal Proc. Image Process. Pattern …, 2014 - academia.edu
The goal of data mining is to discover hidden useful information in large databases. Mining
frequent patterns from transaction databases is an important problem in data mining. As the …
frequent patterns from transaction databases is an important problem in data mining. As the …
[PDF][PDF] Comparing distributed indexing: To MapReduce or not?
Information Retrieval (IR) systems require input corpora to be indexed. The advent of
terabyte-scale Web corpora has reinvigorated the need for efficient indexing. In this work, we …
terabyte-scale Web corpora has reinvigorated the need for efficient indexing. In this work, we …
Indexing word sequences for ranked retrieval
Formulating and processing phrases and other term dependencies to improve query
effectiveness is an important problem in information retrieval. However, accessing word …
effectiveness is an important problem in information retrieval. However, accessing word …