Knowledge graphs: A practical review of the research landscape

M Kejriwal - Information, 2022 - mdpi.com
Knowledge graphs (KGs) have rapidly emerged as an important area in AI over the last ten
years. Building on a storied tradition of graphs in the AI community, a KG may be simply …

Webformer: The web-page transformer for structure information extraction

Q Wang, Y Fang, A Ravula, F Feng, X Quan… - Proceedings of the ACM …, 2022 - dl.acm.org
Structure information extraction refers to the task of extracting structured text fields from web
pages, such as extracting a product offer from a shop** page including product title …

NAS-BERT: Task-agnostic and adaptive-size BERT compression with neural architecture search

J Xu, X Tan, R Luo, K Song, J Li, T Qin… - Proceedings of the 27th …, 2021 - dl.acm.org
While pre-trained language models (eg, BERT) have achieved impressive results on
different natural language processing tasks, they have large numbers of parameters and …

Spatial dependency parsing for semi-structured document information extraction

W Hwang, J Yim, S Park, S Yang, M Seo - arxiv preprint arxiv:2005.00642, 2020 - arxiv.org
Information Extraction (IE) for semi-structured document images is often approached as a
sequence tagging problem by classifying each recognized input token into one of the IOB …

Markuplm: Pre-training of text and markup language for visually-rich document understanding

J Li, Y Xu, L Cui, F Wei - arxiv preprint arxiv:2110.08518, 2021 - arxiv.org
Multimodal pre-training with text, layout, and image has made significant progress for
Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such …

Data extraction via semantic regular expression synthesis

Q Chen, A Banerjee, Ç Demiralp, G Durrett… - Proceedings of the ACM …, 2023 - dl.acm.org
Many data extraction tasks of practical relevance require not only syntactic pattern matching
but also semantic reasoning about the content of the underlying text. While regular …

Dom-lm: Learning generalizable representations for html documents

X Deng, P Shiralkar, C Lockard, B Huang… - arxiv preprint arxiv …, 2022 - arxiv.org
HTML documents are an important medium for disseminating information on the Web for
human consumption. An HTML document presents information in multiple text formats …

Simplified dom trees for transferable attribute extraction from the web

Y Zhou, Y Sheng, N Vo, N Edmonds, S Tata - arxiv preprint arxiv …, 2021 - arxiv.org
There has been a steady need to precisely extract structured knowledge from the web (ie
HTML documents). Given a web page, extracting a structured object along with various …

Web question answering with neurosymbolic program synthesis

Q Chen, A Lamoreaux, X Wang, G Durrett… - Proceedings of the …, 2021 - dl.acm.org
In this paper, we propose a new technique based on program synthesis for extracting
information from webpages. Given a natural language query and a few labeled webpages …

WIERT: web information extraction via render tree

Z Li, B Shao, L Shou, M Gong, G Li… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Web information extraction (WIE) is a fundamental problem in web document understanding,
with a significant impact on various applications. Visual information plays a crucial role in …