- Academic Search

R Kosala, H Blockeel - ACM Sigkdd Explorations Newsletter, 2000 - dl.acm.org

With the huge amount of information available online, the World Wide Web is a fertile area
for data mining research. The Web mining research is at the cross road of research from …

Save Cite Cited by 2608 Related articles All 47 versions Free GPT-4

[Free GPT-4]

[PDF] vldb.org

[PDF][PDF] Roadrunner: Towards automatic data extraction from large web sites

V Crescenzi, G Mecca, P Merialdo - VLDB, 2001 - vldb.org

The paper investigates techniques for extracting data from HTML sites through the use of
automatically generated wrappers. To automate the wrapper generation and the data …

Save Cite Cited by 1595 Related articles All 27 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] psu.edu

Generating finite-state transducers for semi-structured data extraction from the web

CN Hsu, MT Dung - Information systems, 1998 - Elsevier

Integrating a large number of Web information sources may significantly increase the utility
of the World-Wide Web. A promising solution to the integration is through the use of a Web …

Save Cite Cited by 724 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] ox.ac.uk

[PDF][PDF] Visual web information extraction with lixto

R Baumgartner, S Flesca, G Gottlob - 2001 - ora.ox.ac.uk

We present new techniques for supervised wrapper generation and automated web
information extraction, and a system called Lixto implementing these techniques. Our system …

Save Cite Cited by 815 Related articles All 31 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] academia.edu

XWRAP: An XML-enabled wrapper construction system for web information sources

L Liu, C Pu, W Han - … of 16th International Conference on Data …, 2000 - ieeexplore.ieee.org

The paper describes the methodology and the software development of XWRAP, an XML-
enabled wrapper construction system for semi-automatic generation of wrapper programs …

Save Cite Cited by 773 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

[PDF][PDF] A hierarchical approach to wrapper induction

I Muslea, S Minton, C Knoblock - … of the third annual conference on …, 1999 - dl.acm.org

With the tremendous amount of information that becomes available on the Web on a daily
basis, the ability to quickly develop information agents has become a crucial problem. A vital …

Save Cite Cited by 626 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] googleapis.com

Form-based ontology creation and information harvesting

DW Embley, C Tao, SW Liddle - US Patent 8,103,962, 2012 - Google Patents

Extracting data from web pages. User input is received defining a tabular form. User input is
received correlating portions of the form with user selected data items contained in one or …

Save Cite Cited by 314 Related articles All 4 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] psu.edu

Conceptual-model-based data extraction from multiple-record web pages

DW Embley, DM Campbell, YS Jiang, SW Liddle… - Data & Knowledge …, 1999 - Elsevier

Electronically available data on the Web is exploding at an ever increasing pace. Much of
this data is unstructured, which makes searching hard and traditional database querying …

Save Cite Cited by 553 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] academia.edu

Hierarchical wrapper induction for semistructured information sources

I Muslea, S Minton, CA Knoblock - Autonomous Agents and Multi-Agent …, 2001 - Springer

With the tremendous amount of information that becomes available on the Web on a daily
basis, the ability to quickly develop information agents has become a crucial problem. A vital …

Save Cite Cited by 545 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Record-boundary discovery in Web documents

DW Embley, Y Jiang, YK Ng - Proceedings of the 1999 ACM SIGMOD …, 1999 - dl.acm.org

Extraction of information from unstructured or semistructured Web documents often requires
a recognition and delimitation of records.(By “record” we mean a group of information …

Save Cite Cited by 472 Related articles All 13 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Cut and paste

Web mining research: A survey

[PDF][PDF] Roadrunner: Towards automatic data extraction from large web sites

Generating finite-state transducers for semi-structured data extraction from the web

[PDF][PDF] Visual web information extraction with lixto

XWRAP: An XML-enabled wrapper construction system for web information sources

[PDF][PDF] A hierarchical approach to wrapper induction

Form-based ontology creation and information harvesting

Conceptual-model-based data extraction from multiple-record web pages

Hierarchical wrapper induction for semistructured information sources

Record-boundary discovery in Web documents