[BOK][B] Modern information retrieval
R Baeza-Yates, B Ribeiro-Neto - 1999 - people.ischool.berkeley.edu
Information retrieval (IR) has changed considerably in recent years with the expansion of the
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …
Detecting near-duplicates for web crawling
GS Manku, A Jain, A Das Sarma - … of the 16th international conference on …, 2007 - dl.acm.org
Near-duplicate web documents are abundant. Two such documents differ from each other in
a very small portion that displays advertisements, for example. Such differences are …
a very small portion that displays advertisements, for example. Such differences are …
Efficient similarity joins for near-duplicate detection
C ** found things found: The study and practice of personal information management
W Jones - 2010 - books.google.com
Kee** Found Things Found: The Study and Practice of Personal Information Management
is the first comprehensive book on new'favorite child'of R&D at Microsoft and elsewhere …
is the first comprehensive book on new'favorite child'of R&D at Microsoft and elsewhere …
Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD)
An effective approach to phishing Web page detection is proposed, which uses Earth
mover's distance (EMD) to measure Web page visual similarity. We first convert the involved …
mover's distance (EMD) to measure Web page visual similarity. We first convert the involved …
Methods for identifying versioned and plagiarized documents
The widespread use of on‐line publishing of text promotes storage of multiple versions of
documents and mirroring of documents in multiple locations, and greatly simplifies the task …
documents and mirroring of documents in multiple locations, and greatly simplifies the task …
Methods and systems for quick and efficient data management and/or processing
C Dubnicki, E Kruus, C Ungureanu - US Patent 8,214,517, 2012 - Google Patents
US8214517B2 - Methods and systems for quick and efficient data management and/or
processing - Google Patents US8214517B2 - Methods and systems for quick and efficient data …
processing - Google Patents US8214517B2 - Methods and systems for quick and efficient data …
Top-k set similarity joins
Similarity join is a useful primitive operation underlying many applications, such as near
duplicate Web page detection, data integration, and pattern recognition. Traditional similarity …
duplicate Web page detection, data integration, and pattern recognition. Traditional similarity …
Mining query logs: Turning search usage data into knowledge
F Silvestri - Foundations and Trends® in Information …, 2009 - nowpublishers.com
Web search engines have stored in their logs information about users since they started to
operate. This information often serves many purposes. The primary focus of this survey is on …
operate. This information often serves many purposes. The primary focus of this survey is on …