Big scholarly data: A survey

F **a, W Wang, TM Bekele, H Liu - IEEE Transactions on Big …, 2017 - ieeexplore.ieee.org
With the rapid growth of digital publishing, harvesting, managing, and analyzing scholarly
information have become increasingly challenging. The term Big Scholarly Data is coined …

Citeseerx: Ai in a digital library search engine

J Wu, KM Williams, HH Chen, M Khabsa, C Caragea… - AI Magazine, 2015 - ojs.aaai.org
CiteSeerX is a digital library search engine providing access to more than five million
scholarly documents with nearly a million users and millions of hits per day. We present key …

A survey of scholarly data visualization

J Liu, T Tang, W Wang, B Xu, X Kong, F **a - Ieee Access, 2018 - ieeexplore.ieee.org
Scholarly information usually contains millions of raw data, such as authors, papers,
citations, as well as scholarly networks. With the rapid growth of the digital publishing and …

Towards building a scholarly big data platform: Challenges, lessons and opportunities

Z Wu, J Wu, M Khabsa, K Williams… - IEEE/ACM Joint …, 2014 - ieeexplore.ieee.org
We introduce a big data platform that provides various services for harvesting scholarly
information and enabling efficient scholarly applications. The core architecture of the …

Pdfmef: A multi-entity knowledge extraction framework for scholarly documents and semantic search

J Wu, J Killian, H Yang, K Williams… - Proceedings of the 8th …, 2015 - dl.acm.org
We introduce PDFMEF, a multi-entity knowledge extraction framework for scholarly
documents in the PDF format. It is implemented with a framework that encapsulates open …

CSTeller: Forecasting scientific collaboration sustainability based on extreme gradient boosting

W Wang, B Xu, J Liu, Z Cui, S Yu, X Kong, F **a - World Wide Web, 2019 - Springer
The mechanism why two strange scholars become collaborators has been extensively
studied from the perspective of social network analysis. In academia, two scholars may …

Document type classification in online digital libraries

C Caragea, J Wu, S Gollapalli, C Giles - Proceedings of the AAAI …, 2016 - ojs.aaai.org
Online digital libraries make it easier for researchers to search for scientific information. They
have been proven as powerful resources in many data mining, machine learning and …

Improving researcher homepage classification with unlabeled data

SD Gollapalli, C Caragea, P Mitra… - ACM Transactions on the …, 2015 - dl.acm.org
A classifier that determines if a webpage is relevant to a specified set of topics comprises a
key component for focused crawling. Can a classifier that is tuned to perform well on training …

Online learning of deep hybrid architectures for semi-supervised categorization

AG Ororbia, D Reitter, J Wu, CL Giles - … 7-11, 2015, Proceedings, Part I 15, 2015 - Springer
A hybrid architecture is presented capable of online learning from both labeled and
unlabeled samples. It combines both generative and discriminative objectives to derive a …

A data cleaning method for citeseer dataset

Y Wang, H Zhang, Y Li, D Wang, Y Ma, T Zhou… - Web Information Systems …, 2016 - Springer
CiteSeer is considered as the first academic search engine that have been serving data for
almost twenty years. Recently, CiteSeer graciously makes all the data public, including raw …