[BOOK][B] Modern information retrieval

R Baeza-Yates, B Ribeiro-Neto - 1999 - people.ischool.berkeley.edu
Information retrieval (IR) has changed considerably in recent years with the expansion of the
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …

Ubicrawler: A scalable fully distributed web crawler

P Boldi, B Codenotti, M Santini… - Software: Practice and …, 2004 - Wiley Online Library
We report our experience in implementing UbiCrawler, a scalable distributed Web crawler,
using the Java programming language. The main features of UbiCrawler are platform …

Effective web crawling

C Castillo - Acm sigir forum, 2005 - dl.acm.org
The key factors for the success of the World Wide Web are its large size and the lack of a
centralized control over its contents. Both issues are also the most important source of …

Crawling a country: better strategies than breadth-first for web page ordering

R Baeza-Yates, C Castillo, M Marin… - Special interest tracks …, 2005 - dl.acm.org
This article compares several page ordering strategies for Web crawling under several
metrics. The objective of these strategies is to download the most" important" pages" early" …

Scalability challenges in web search engines

BB Cambazoglu, R Baeza-Yates - Advanced topics in information retrieval, 2011 - Springer
Continuous growth of the Web and user bases forces web search engine companies to
make costly investments on very large compute infrastructures. The scalability of these …

An investigation of web crawler behavior: characterization and metrics

MD Dikaiakos, A Stassopoulou… - Computer …, 2005 - Elsevier
In this paper, we present a characterization study of search-engine crawlers. For the
purposes of our work, we use Web-server access logs from five academic sites in three …

BioCrawler: An intelligent crawler for the semantic web

A Batzios, C Dimou, AL Symeonidis… - Expert Systems with …, 2008 - Elsevier
Web crawling has become an important aspect of web search, as the WWW keeps getting
bigger and search engines strive to index the most important and up to date content. Many …

Architecture of a grid-enabled Web search engine

BB Cambazoglu, E Karaca, T Kucukyilmaz… - Information processing & …, 2007 - Elsevier
Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running
on the grid infrastructure. It offers a personalized, on-demand, country-specific, category …

Smart distributed web crawler

SK Bal, G Geetha - 2016 International Conference on …, 2016 - ieeexplore.ieee.org
Centralized crawlers are not adequate to spider meaningful and relevant portions of the
Web. A crawler with good scalability and load balancing can bring growth to performance …

Intermediary infrastructures for the world wide web

MD Dikaiakos - Computer Networks, 2004 - Elsevier
Intermediaries are software entities, deployed on hosts of the wireline and wireless network,
that mediate the interaction between clients and servers of the World Wide Web. In this …