[PDF][PDF] Regular expression learning for information extraction

Y Li, R Krishnamurthy, S Raghavan… - Proceedings of the …, 2008 - aclanthology.org
Regular expressions have served as the dominant workhorse of practical information
extraction for several years. However, there has been little work on reducing the manual …

Searching the enterprise

U Kruschwitz, C Hull - Foundations and Trends® in …, 2017 - nowpublishers.com
Search has become ubiquitous but that does not mean that search has been solved.
Enterprise search, which is broadly speaking the use of information retrieval technology to …

Examining the limits of crowdsourcing for relevance assessment

P Clough, M Sanderson, J Tang… - IEEE Internet …, 2012 - ieeexplore.ieee.org
Evaluation is instrumental to develo** and managing effective information retrieval
systems. For this process, enlisting crowdsourcing has proven viable. However, less …

Joining extractions of regular expressions

DD Freydenberger, B Kimelfeld… - Proceedings of the 37th …, 2018 - dl.acm.org
Regular expressions with capture variables, also known as" regex formulas,''extract relations
of spans (interval positions) from text. These relations can be further manipulated via the …

Grammars for document spanners

L Peterfreund - arxiv preprint arxiv:2003.06880, 2020 - arxiv.org
We propose a new grammar-based language for defining information-extractors from
documents (text) that is built upon the well-studied framework of document spanners for …

Enterprise search in the big data era: Recent developments and open challenges

Y Li, Z Liu, H Zhu - Proceedings of the VLDB Endowment, 2014 - dl.acm.org
Enterprise search allows users in an enterprise to retrieve desired information through a
simple search interface. It is widely viewed as an important productivity tool within an …

Recursive programs for document spanners

L Peterfreund, B Cate, R Fagin, B Kimelfeld - arxiv preprint arxiv …, 2017 - arxiv.org
A document spanner models a program for Information Extraction (IE) as a function that
takes as input a text document (string over a finite alphabet) and produces a relation of …

Declarative cleaning of inconsistencies in information extraction

R Fagin, B Kimelfeld, F Reiss… - ACM Transactions on …, 2016 - dl.acm.org
The population of a predefined relational schema from textual content, commonly known as
Information Extraction (IE), is a pervasive task in contemporary computational challenges …

Detection of boilerplate content

J Zeng, Y Li, BR Murphy, Y Shen - US Patent 8,898,296, 2014 - Google Patents
Methods, systems, and apparatus, including computer pro grams encoded on computer
storage media, for generating query recommendations. One method provides selecting one …

Cleaning inconsistencies in information extraction via prioritized repairs

R Fagin, B Kimelfeld, F Reiss… - Proceedings of the 33rd …, 2014 - dl.acm.org
The population of a predefined relational schema from textual content, commonly known as
Information Extraction (IE), is a pervasive task in contemporary computational challenges …