[BOOK][B] Modern information retrieval
R Baeza-Yates, B Ribeiro-Neto - 1999 - people.ischool.berkeley.edu
Information retrieval (IR) has changed considerably in recent years with the expansion of the
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …
Ubicrawler: A scalable fully distributed web crawler
We report our experience in implementing UbiCrawler, a scalable distributed Web crawler,
using the Java programming language. The main features of UbiCrawler are platform …
using the Java programming language. The main features of UbiCrawler are platform …
Effective web crawling
C Castillo - Acm sigir forum, 2005 - dl.acm.org
The key factors for the success of the World Wide Web are its large size and the lack of a
centralized control over its contents. Both issues are also the most important source of …
centralized control over its contents. Both issues are also the most important source of …
Crawling a country: better strategies than breadth-first for web page ordering
This article compares several page ordering strategies for Web crawling under several
metrics. The objective of these strategies is to download the most" important" pages" early" …
metrics. The objective of these strategies is to download the most" important" pages" early" …
Scalability challenges in web search engines
BB Cambazoglu, R Baeza-Yates - Advanced topics in information retrieval, 2011 - Springer
Continuous growth of the Web and user bases forces web search engine companies to
make costly investments on very large compute infrastructures. The scalability of these …
make costly investments on very large compute infrastructures. The scalability of these …
An investigation of web crawler behavior: characterization and metrics
In this paper, we present a characterization study of search-engine crawlers. For the
purposes of our work, we use Web-server access logs from five academic sites in three …
purposes of our work, we use Web-server access logs from five academic sites in three …
BioCrawler: An intelligent crawler for the semantic web
A Batzios, C Dimou, AL Symeonidis… - Expert Systems with …, 2008 - Elsevier
Web crawling has become an important aspect of web search, as the WWW keeps getting
bigger and search engines strive to index the most important and up to date content. Many …
bigger and search engines strive to index the most important and up to date content. Many …
Architecture of a grid-enabled Web search engine
BB Cambazoglu, E Karaca, T Kucukyilmaz… - Information processing & …, 2007 - Elsevier
Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running
on the grid infrastructure. It offers a personalized, on-demand, country-specific, category …
on the grid infrastructure. It offers a personalized, on-demand, country-specific, category …
Smart distributed web crawler
SK Bal, G Geetha - 2016 International Conference on …, 2016 - ieeexplore.ieee.org
Centralized crawlers are not adequate to spider meaningful and relevant portions of the
Web. A crawler with good scalability and load balancing can bring growth to performance …
Web. A crawler with good scalability and load balancing can bring growth to performance …
Intermediary infrastructures for the world wide web
MD Dikaiakos - Computer Networks, 2004 - Elsevier
Intermediaries are software entities, deployed on hosts of the wireline and wireless network,
that mediate the interaction between clients and servers of the World Wide Web. In this …
that mediate the interaction between clients and servers of the World Wide Web. In this …