Web robot detection techniques: overview and limitations

D Doran, SS Gokhale - Data Mining and Knowledge Discovery, 2011 - Springer
Most modern Web robots that crawl the Internet to support value-added services and
technologies possess sophisticated data collection and analysis capabilities. Some of these …

Web robot detection: A probabilistic reasoning approach

A Stassopoulou, MD Dikaiakos - Computer Networks, 2009 - Elsevier
In this paper, we introduce a probabilistic modeling approach for addressing the problem of
Web robot detection from Web-server access logs. More specifically, we construct a …

Reinforcement learning based web crawler detection for diversity and dynamics

Y Gao, Z Feng, X Wang, M Song, X Wang, X Wang… - Neurocomputing, 2023 - Elsevier
Crawler detection is always an important research topic in network security. With the
development of web technology, crawlers are constantly updating and changing, and their …

Systems and methods of handling internet spiders

J Alexander - US Patent 7,987,173, 2011 - Google Patents
Aspects relate to identifying Internet spiders with an approach involving a plurality of
instances of one or more URLs, which reference resources available from a first domain …

Web robot detection based on pattern-matching technique

S Kwon, YG Kim, S Cha - Journal of Information Science, 2012 - journals.sagepub.com
In web robot detection it is important is to find features that are common characteristics of
diverse robots, in order to differentiate between them and humans. Existing approaches …

[PDF][PDF] Web robot detection based on monotonous behavior

S Kwon, M Oh, D Kim, J Lee, YG Kim… - Proceedings of the …, 2012 - academia.edu
Several studies examined various features on how to most effectively detect web robots.
Based on an insight that most web robots, regardless of specifics, would exhibit focused and …

Detection, classification, and workload analysis of web robots

D Doran - 2014 - digitalcommons.lib.uconn.edu
It has been traditionally believed that humans, who exhibit well-studied behaviors and
statistical regularities in their traffic, primarily generate the stream of traffic seen by Web …

Low-load server crawler: Design and evaluation

KT Nakahira, T Hoshino, Y Mikami - Proceedings of the 17th international …, 2008 - dl.acm.org
This paper proposes a method of crawling Web servers connected to the Internet without
imposing a high processing load. We are using the crawler for a field survey of the digital …

Challenges in Using Peer-to-Peer Structures in Order to Design a Large-Scale Web Search Engine

H Mousavi, A Movaghar - Advances in Computer Science and Engineering …, 2009 - Springer
One of the distributed solutions for scaling Web Search Engines (WSEs) may be peer-to-
peer (P2P) structures. P2P structures are successfully being used in many systems with …

Virtual evidence: analyze the footsteps of your users

W Shih - Journal of Hospital Librarianship, 2007 - Taylor & Francis
This paper presents a study of Web Crawler activities based upon Web access logs from the
Web site of an academic library. It further compares crawler behavior with that of regular …