Pass-join: A partition-based method for similarity joins

G Li, D Deng, J Wang, J Feng - arxiv preprint arxiv:1111.7171, 2011 - arxiv.org
As an essential operation in data cleaning, the similarity join has attracted considerable
attention from the database community. In this paper, we study string similarity joins with edit …

Ed-join: an efficient algorithm for similarity joins with edit distance constraints

C **ao, W Wang, X Lin - Proceedings of the VLDB Endowment, 2008 - dl.acm.org
There has been considerable interest in similarity join in the research community recently.
Similarity join is a fundamental operation in many application areas, such as data integration …

Top-k set similarity joins

C **ao, W Wang, X Lin, H Shang - 2009 IEEE 25th …, 2009 - ieeexplore.ieee.org
Similarity join is a useful primitive operation underlying many applications, such as near
duplicate Web page detection, data integration, and pattern recognition. Traditional similarity …

Fast-join: An efficient method for fuzzy token matching based string similarity join

J Wang, G Li, J Fe - 2011 IEEE 27th International Conference …, 2011 - ieeexplore.ieee.org
String similarity join that finds similar string pairs between two string sets is an essential
operation in many applications, and has attracted significant attention recently in the …

[PDF][PDF] Simple and efficient algorithm for approximate dictionary matching

N Okazaki, J Tsujii - … of the 23rd International Conference on …, 2010 - aclanthology.org
This paper presents a simple and efficient algorithm for approximate dictionary matching
designed for similarity measures such as cosine, Dice, Jaccard, and overlap coefficients. We …

Efficient approximate entity extraction with edit distance constraints

W Wang, C **ao, X Lin, C Zhang - Proceedings of the 2009 ACM …, 2009 - dl.acm.org
Named entity recognition aims at extracting named entities from unstructured text. A recent
trend of named entity recognition is finding approximate matches in the text with respect to a …

Space-constrained gram-based indexing for efficient approximate string search

A Behm, S Ji, C Li, J Lu - 2009 IEEE 25th International …, 2009 - ieeexplore.ieee.org
Answering approximate queries on string collections is important in applications such as
data cleaning, query relaxation, and spell checking, where inconsistencies and errors exist …

Attention guidance for immersive video content in head-mounted displays

F Danieau, A Guillo, R Doré - 2017 IEEE Virtual Reality (VR), 2017 - ieeexplore.ieee.org
Immersive videos allow users to freely explore 4 π steradian scenes within head-mounted
displays (HMD), leading to a strong feeling of immersion. However users may miss important …

Astrid: accurate selectivity estimation for string predicates using deep learning

S Shetiya, S Thirumuruganathan, N Koudas… - Proceedings of the …, 2020 - par.nsf.gov
Accurate selectivity estimation for string predicates is a long-standing research challenge in
databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) …

Cost-based variable-length-gram selection for string collections to support approximate queries efficiently

X Yang, B Wang, C Li - Proceedings of the 2008 ACM SIGMOD …, 2008 - dl.acm.org
Approximate queries on a collection of strings are important in many applications such as
record linkage, spell checking, and Web search, where inconsistencies and errors exist in …