Web mining research: A survey
R Kosala, H Blockeel - ACM Sigkdd Explorations Newsletter, 2000 - dl.acm.org
With the huge amount of information available online, the World Wide Web is a fertile area
for data mining research. The Web mining research is at the cross road of research from …
for data mining research. The Web mining research is at the cross road of research from …
[PDF][PDF] Roadrunner: Towards automatic data extraction from large web sites
The paper investigates techniques for extracting data from HTML sites through the use of
automatically generated wrappers. To automate the wrapper generation and the data …
automatically generated wrappers. To automate the wrapper generation and the data …
Generating finite-state transducers for semi-structured data extraction from the web
CN Hsu, MT Dung - Information systems, 1998 - Elsevier
Integrating a large number of Web information sources may significantly increase the utility
of the World-Wide Web. A promising solution to the integration is through the use of a Web …
of the World-Wide Web. A promising solution to the integration is through the use of a Web …
[PDF][PDF] Visual web information extraction with lixto
We present new techniques for supervised wrapper generation and automated web
information extraction, and a system called Lixto implementing these techniques. Our system …
information extraction, and a system called Lixto implementing these techniques. Our system …
XWRAP: An XML-enabled wrapper construction system for web information sources
The paper describes the methodology and the software development of XWRAP, an XML-
enabled wrapper construction system for semi-automatic generation of wrapper programs …
enabled wrapper construction system for semi-automatic generation of wrapper programs …
[PDF][PDF] A hierarchical approach to wrapper induction
With the tremendous amount of information that becomes available on the Web on a daily
basis, the ability to quickly develop information agents has become a crucial problem. A vital …
basis, the ability to quickly develop information agents has become a crucial problem. A vital …
Form-based ontology creation and information harvesting
Extracting data from web pages. User input is received defining a tabular form. User input is
received correlating portions of the form with user selected data items contained in one or …
received correlating portions of the form with user selected data items contained in one or …
Conceptual-model-based data extraction from multiple-record web pages
DW Embley, DM Campbell, YS Jiang, SW Liddle… - Data & Knowledge …, 1999 - Elsevier
Electronically available data on the Web is exploding at an ever increasing pace. Much of
this data is unstructured, which makes searching hard and traditional database querying …
this data is unstructured, which makes searching hard and traditional database querying …
Hierarchical wrapper induction for semistructured information sources
With the tremendous amount of information that becomes available on the Web on a daily
basis, the ability to quickly develop information agents has become a crucial problem. A vital …
basis, the ability to quickly develop information agents has become a crucial problem. A vital …
Record-boundary discovery in Web documents
DW Embley, Y Jiang, YK Ng - Proceedings of the 1999 ACM SIGMOD …, 1999 - dl.acm.org
Extraction of information from unstructured or semistructured Web documents often requires
a recognition and delimitation of records.(By “record” we mean a group of information …
a recognition and delimitation of records.(By “record” we mean a group of information …