Topic discovery via latent space clustering of pretrained language model representations
Topic models have been the prominent tools for automatic topic discovery from text corpora.
Despite their effectiveness, topic models suffer from several limitations including the inability …
Despite their effectiveness, topic models suffer from several limitations including the inability …
The effect of metadata on scientific literature tagging: A cross-field cross-model study
Due to the exponential growth of scientific publications on the Web, there is a pressing need
to tag each paper with fine-grained topics so that researchers can track their interested fields …
to tag each paper with fine-grained topics so that researchers can track their interested fields …
Weakly-supervised scientific document classification via retrieval-augmented multi-stage training
Scientific document classification is a critical task for a wide range of applications, but the
cost of collecting human-labeled data can be prohibitive. We study scientific document …
cost of collecting human-labeled data can be prohibitive. We study scientific document …
Metadata-induced contrastive learning for zero-shot multi-label text classification
Large-scale multi-label text classification (LMTC) aims to associate a document with its
relevant labels from a large candidate set. Most existing LMTC approaches rely on massive …
relevant labels from a large candidate set. Most existing LMTC approaches rely on massive …
Effective seed-guided topic discovery by integrating multiple types of contexts
Instead of mining coherent topics from a given text corpus in a completely unsupervised
manner, seed-guided topic discovery methods leverage user-provided seed words to extract …
manner, seed-guided topic discovery methods leverage user-provided seed words to extract …
Weakly supervised multi-label classification of full-text scientific papers
Instead of relying on human-annotated training samples to build a classifier, weakly
supervised scientific paper classification aims to classify papers only using category …
supervised scientific paper classification aims to classify papers only using category …
PIEClass: Weakly-supervised text classification with prompting and noise-robust iterative ensemble training
Weakly-supervised text classification trains a classifier using the label name of each target
class as the only supervision, which largely reduces human annotation efforts. Most existing …
class as the only supervision, which largely reduces human annotation efforts. Most existing …
Cl-wstc: Continual learning for weakly supervised text classification on the internet
Continual text classification is an important research direction in Web mining. Existing works
are limited to supervised approaches relying on abundant labeled data, but in the open and …
are limited to supervised approaches relying on abundant labeled data, but in the open and …
Seed-guided topic discovery with out-of-vocabulary seeds
Discovering latent topics from text corpora has been studied for decades. Many existing
topic models adopt a fully unsupervised setting, and their discovered topics may not cater to …
topic models adopt a fully unsupervised setting, and their discovered topics may not cater to …
Bridging Text Data and Graph Data: Towards Semantics and Structure-aware Knowledge Discovery
Graphs and texts are two key modalities in data mining. In many cases, the data presents a
mixture of the two modalities and the information is often complementary: in e-commerce …
mixture of the two modalities and the information is often complementary: in e-commerce …