Big scholarly data: A survey
With the rapid growth of digital publishing, harvesting, managing, and analyzing scholarly
information have become increasingly challenging. The term Big Scholarly Data is coined …
information have become increasingly challenging. The term Big Scholarly Data is coined …
Citeseerx: Ai in a digital library search engine
CiteSeerX is a digital library search engine providing access to more than five million
scholarly documents with nearly a million users and millions of hits per day. We present key …
scholarly documents with nearly a million users and millions of hits per day. We present key …
A survey of scholarly data visualization
Scholarly information usually contains millions of raw data, such as authors, papers,
citations, as well as scholarly networks. With the rapid growth of the digital publishing and …
citations, as well as scholarly networks. With the rapid growth of the digital publishing and …
Towards building a scholarly big data platform: Challenges, lessons and opportunities
We introduce a big data platform that provides various services for harvesting scholarly
information and enabling efficient scholarly applications. The core architecture of the …
information and enabling efficient scholarly applications. The core architecture of the …
Pdfmef: A multi-entity knowledge extraction framework for scholarly documents and semantic search
We introduce PDFMEF, a multi-entity knowledge extraction framework for scholarly
documents in the PDF format. It is implemented with a framework that encapsulates open …
documents in the PDF format. It is implemented with a framework that encapsulates open …
CSTeller: Forecasting scientific collaboration sustainability based on extreme gradient boosting
The mechanism why two strange scholars become collaborators has been extensively
studied from the perspective of social network analysis. In academia, two scholars may …
studied from the perspective of social network analysis. In academia, two scholars may …
Document type classification in online digital libraries
Online digital libraries make it easier for researchers to search for scientific information. They
have been proven as powerful resources in many data mining, machine learning and …
have been proven as powerful resources in many data mining, machine learning and …
Improving researcher homepage classification with unlabeled data
A classifier that determines if a webpage is relevant to a specified set of topics comprises a
key component for focused crawling. Can a classifier that is tuned to perform well on training …
key component for focused crawling. Can a classifier that is tuned to perform well on training …
Online learning of deep hybrid architectures for semi-supervised categorization
A hybrid architecture is presented capable of online learning from both labeled and
unlabeled samples. It combines both generative and discriminative objectives to derive a …
unlabeled samples. It combines both generative and discriminative objectives to derive a …
A data cleaning method for citeseer dataset
Y Wang, H Zhang, Y Li, D Wang, Y Ma, T Zhou… - Web Information Systems …, 2016 - Springer
CiteSeer is considered as the first academic search engine that have been serving data for
almost twenty years. Recently, CiteSeer graciously makes all the data public, including raw …
almost twenty years. Recently, CiteSeer graciously makes all the data public, including raw …