Cermine--automatic extraction of metadata and references from scientific literature

D Tkaczyk, P Szostek, PJ Dendek… - 2014 11th IAPR …, 2014 - ieeexplore.ieee.org
CERMINE is a comprehensive open source system for extracting metadata and parsed
bibliographic references from scientific articles in born-digital form. The system is based on a …

Unsupervised document structure analysis of digital scientific articles

S Klampfl, M Granitzer, K Jack, R Kern - International journal on digital …, 2014 - Springer
Text mining and information retrieval in large collections of scientific literature require
automated processing systems that analyse the documents' content. However, the layout of …

FLAG-PDFe: Features oriented metadata extraction framework for scientific publications

MW Ahmed, MT Afzal - IEEE Access, 2020 - ieeexplore.ieee.org
The unprecedented growth of the research publications in diversified domains has
overwhelmed the research community. It requires a cumbersome process to extract this …

Data processing systems, devices, and methods for content analysis

R Tsibulevskiy, B Greenbaum - US Patent 9,223,769, 2015 - Google Patents
4,504,972 A 3, 1985 Scherlet al. 5,073.953 A 12/1991 Westdijk 5,103,489 A 4, 1992 Miette
5,111,408 A 5/1992 Amjadi 5,144,679 A 9, 1992 Kakumoto et al. 5,159,667 A 10/1992 …

An unsupervised machine learning approach to body text and table of contents extraction from digital scientific articles

S Klampfl, R Kern - Research and Advanced Technology for Digital …, 2013 - Springer
Scientific articles are predominantly stored in digital document formats, which are optimised
for presentation, but lack structural information. This poses challenges to access the …

New methods for metadata extraction from scientific literature

D Tkaczyk - arxiv preprint arxiv:1710.10201, 2017 - arxiv.org
Within the past few decades we have witnessed digital revolution, which moved scholarly
communication to electronic media and also resulted in a substantial increase in its volume …

Ensemble imputation methods for missing software engineering data

B Twala, M Cartwright - 11th IEEE International Software …, 2005 - ieeexplore.ieee.org
One primary concern of software engineering is prediction accuracy. We use datasets to
build and validate prediction systems of software development effort, for example. However …

[PDF][PDF] Insights to the state-of-the-art PDF Extraction Techniques

AM Hashmi, F Qayyum, MT Afzal - IPSI Trans. Internet Res, 2020 - ipsitransactions.org
Digitized documents have become the omnipresent medium of information. A plethora of
scholarly documents on the web is excessively being increased. Various digital libraries …

A hybrid strategy to extract metadata from scholarly articles by utilizing support vector machine and heuristics

M Waqas, N Anjum, MT Afzal - Scientometrics, 2023 - Springer
The immense growth in online research publications has attracted the research community
to extract valuable information from scientific resources by exploring online digital libraries …

Large scale citation matching using Apache Hadoop

M Fedoryszak, D Tkaczyk, Ł Bolikowski - … on Theory and Practice of Digital …, 2013 - Springer
During the process of citation matching links from bibliography entries to referenced
publications are created. Such links are indicators of topical similarity between linked texts …