Extracting logical hierarchical structure of HTML documents based on headings

T Manabe, K Tajima - Proceedings of the VLDB Endowment, 2015 - dl.acm.org
We propose a method for extracting logical hierarchical structure of HTML documents.
Because mark-up structure in HTML documents does not necessarily coincide with logical …

[PDF][PDF] Beyond generic summarization: A multi-faceted hierarchical summarization corpus of large heterogeneous data

C Tauchmann, T Arnold, A Hanselowski… - Proceedings of the …, 2018 - aclanthology.org
Automatic summarization has so far focused on datasets of ten to twenty rather short
documents, typically news articles. But automatic systems could in theory analyze hundreds …

First-order logic rule induction for information extraction in web resources

JI Fernández-Villamor, CA Iglesias… - International Journal on …, 2012 - World Scientific
Information extraction out of web pages, commonly known as screen scra**, is usually
performed through wrapper induction, a technique that is based on the internal structure of …

Jura: Towards automatic compliance assessment for annual reports of listed companies

Z Xu, Y Cao, R Cao, G Li, X Liu, Y Pang… - Proceedings of the 30th …, 2021 - dl.acm.org
The initial public offering (IPO) market in Hong Kong is consistently one of the largest in the
world. As part of its regulatory responsibilities, Hong Kong Exchanges and Clearing Limited …

Revisiting web data extraction using in-browser structural analysis and visual cues in modern web designs

A Murolo, MC Norrie - … : 16th International Conference, ICWE 2016, Lugano …, 2016 - Springer
Recent trends in website design have an impact on methods used for web data extraction.
Many existing methods rely on structural analysis of web pages and, with the introduction of …

[PDF][PDF] Hierarchy identification for automatically generating table-of-contents

N Erbs, I Gurevych, T Zesch - Proceedings of the International …, 2013 - aclanthology.org
A table-of-contents (TOC) provides a quick reference to a document's content and structure.
We present the first study on identifying the hierarchical structure for automatically …

Approaches to Automatic Text Structuring

N Erbs - 2015 - tuprints.ulb.tu-darmstadt.de
Structured text helps readers to better understand the content of documents. In classic
newspaper texts or books, some structure already exists. In the Web 2.0, the amount of …

Web Search Based on Hierarchical Heading-Block Structure Analysis

T Manabe - 2016 - repository.kulib.kyoto-u.ac.jp
Authors write headings for splitting a document into multiple semantic blocks of different
topics. A block may include some other blocks, and the blocks in a document compose …

Semantic Service Discovery Techniques for the composable web

JI Fernández Villamor - 2012 - oa.upm.es
This PhD thesis contributes to the problem of resource and service discovery in the context
of the composable web. In the current web, mashup technologies allow developers reusing …

[BOOK][B] Semantic Service Discovery Techniques for the composable web

JIF Villamor - 2012 - core.ac.uk
This PhD thesis contributes to the problem of resource and service discovery in the context
of the composable web. In the current web, mashup technologies allow developers reusing …