Linear-complexity data-parallel earth mover's distance approximations

K Atasu, T Mittelholzer - International Conference on …, 2019 - proceedings.mlr.press
Abstract The Earth Mover's Distance (EMD) is a state-of-the art metric for comparing discrete
probability distributions, but its high distinguishability comes at a high cost in computational …

A novel approximation to dynamic time war** allows anytime clustering of massive time series datasets

Q Zhu, G Batista, T Rakthanmanon, E Keogh - Proceedings of the 2012 SIAM …, 2012 - SIAM
Given the ubiquity of time series data, the data mining community has spent significant time
investigating the best time series similarity measure to use for various tasks and domains …

Binary sketches for secondary filtering

V Mic, D Novak, P Zezula - ACM Transactions on Information Systems …, 2018 - dl.acm.org
This article addresses the problem of matching the most similar data objects to a given query
object. We adopt a generic model of similarity that involves the domain of objects and metric …

Multiple feature fusion for social media applications

B Cui, AKH Tung, C Zhang, Z Zhao - Proceedings of the 2010 ACM …, 2010 - dl.acm.org
The emergence of social media as a crucial paradigm has posed new challenges to the
research and industry communities, where media are designed to be disseminated through …

A binary grey wolf optimizer to solve the scientific document summarization problem

R Das, D Debnath, P Pakray, NC Kumar - Multimedia Tools and …, 2024 - Springer
The extraction of information from the extensive volume of online textual data poses a
significant challenge, and text summarization plays a pivotal role in overcoming this …

Indexing the earth mover's distance using normal distributions

BE Ruttenberg, AK Singh - arxiv preprint arxiv:1111.7168, 2011 - arxiv.org
Querying uncertain data sets (represented as probability distributions) presents many
challenges due to the large amount of data involved and the difficulties comparing …

[PDF][PDF] Distance-based similarity models for content-based multimedia retrieval

C Beecks - 2013 - publications.rwth-aachen.de
Concomitant with the digital information age, an increasing amount of multimedia data is
generated, processed, and finally stored in very large multimedia data collections. The …

An efficient and effective similarity measure to enable data mining of petroglyphs

Q Zhu, X Wang, E Keogh, SH Lee - Data Mining and Knowledge …, 2011 - Springer
Rock art is an archaeological term for human-made markings on stone, including carved
markings, known as petroglyphs, and painted markings, known as pictographs. It is believed …

Linear-complexity relaxed word Mover's distance with GPU acceleration

K Atasu, T Parnell, C Dünner, M Sifalakis… - … Conference on Big …, 2017 - ieeexplore.ieee.org
The amount of unstructured text-based data is growing every day. Querying, clustering, and
classifying this big data requires similarity computations across large sets of documents …

Robust set reconciliation

D Chen, C Konrad, K Yi, W Yu, Q Zhang - Proceedings of the 2014 ACM …, 2014 - dl.acm.org
Set reconciliation is a fundamental problem in distributed databases, where two parties each
holding a set of elements wish to find their difference, so as to establish data consistency …