An overview of web data clustering practices

A Vakali, J Pokorný, T Dalamagas - International conference on extending …, 2004 - Springer
Clustering is a challenging topic in the area of Web data management. Various forms of
clustering are required in a wide range of applications, including finding mirrored Web …

Method and system for processing documents through document history encapsulation

JY Vion-Dury - US Patent 9,448,986, 2016 - Google Patents
(57) ABSTRACT A computer-implemented system and method for processing a markup
language document and its change history are provided. The method includes receiving first …

A change detection system for unordered XML data using a relational model

S Sundaram, SK Madria - Data & Knowledge Engineering, 2012 - Elsevier
The dramatic increase in the evolution of XML data available on the Internet requires a
change detection system to keep track of important changes occurring during their life time …

Integration of web sources under uncertainty and dependencies using probabilistic XML

ML Ba, S Montenez, R Tang, T Abdessalem - Database Systems for …, 2014 - Springer
We study in this vision paper the problem of integrating several web data sources under
uncertainty and dependencies. We present a concrete application with web sources about …

Accurate and efficient html differencing

R Mikhaiel, E Stroulia - 13th IEEE International Workshop on …, 2005 - ieeexplore.ieee.org
Recognizing the differences between subsequent versions of HTML documents is an
important problem. It is useful for managers of multi-authored Web sites who need to review …

[PDF][PDF] Merging Uncertain Multi-Version XML Documents.

ML Ba, T Abdessalem, P Senellart - DChanges, 2013 - ceur-ws.org
Merging is a fundamental operation in revision control systems that enables integrating
different changes made to the same documents. In open platforms, such as Wikipedia …

Difference computation using change identification techniques for structured web documents

J Arora, KR Ramkumar - IOP Conference Series: Materials …, 2021 - iopscience.iop.org
In this era of the competitive world, one needs to stay updated with all the information that is
required for their professional and personal growth. But due to vast information, it is difficult …

[PDF][PDF] XML Diff and patch tool

K Komvoteas - MS in Distributed Multimedia and Information Systems …, 2003 - Citeseer
The increasing use of XML the last few years, led to the creation of many differencing and
patching tools capable of handling tree-structured documents. However, all of those tools …

An incrementally trainable statistical approach to information extraction based on token classification and rich context models

C Siefkes - 2007 - refubium.fu-berlin.de
Most of the information stored in digital form is hidden in natural language (NL) texts. While
information retrieval (IR) helps to locate documents which might contain the facts needed …

Diffing, patching and merging XML documents: toward a generic calculus of editing deltas.

JY Vion-Dury - Proceedings of the 10th ACM symposium on Document …, 2010 - dl.acm.org
This work addresses what we believe to be a central issue in the field of XML diff and merge
computation: the mathematical modeling of the so-called" editing deltas" and the study of …