Topic discovery via latent space clustering of pretrained language model representations

Y Meng, Y Zhang, J Huang, Y Zhang… - Proceedings of the ACM …, 2022 - dl.acm.org
Topic models have been the prominent tools for automatic topic discovery from text corpora.
Despite their effectiveness, topic models suffer from several limitations including the inability …

The effect of metadata on scientific literature tagging: A cross-field cross-model study

Y Zhang, B **, Q Zhu, Y Meng, J Han - Proceedings of the ACM Web …, 2023 - dl.acm.org
Due to the exponential growth of scientific publications on the Web, there is a pressing need
to tag each paper with fine-grained topics so that researchers can track their interested fields …

Weakly-supervised scientific document classification via retrieval-augmented multi-stage training

R Xu, Y Yu, J Ho, C Yang - Proceedings of the 46th International ACM …, 2023 - dl.acm.org
Scientific document classification is a critical task for a wide range of applications, but the
cost of collecting human-labeled data can be prohibitive. We study scientific document …

Metadata-induced contrastive learning for zero-shot multi-label text classification

Y Zhang, Z Shen, CH Wu, B **e, J Hao… - Proceedings of the …, 2022 - dl.acm.org
Large-scale multi-label text classification (LMTC) aims to associate a document with its
relevant labels from a large candidate set. Most existing LMTC approaches rely on massive …

Effective seed-guided topic discovery by integrating multiple types of contexts

Y Zhang, Y Zhang, M Michalski, Y Jiang… - Proceedings of the …, 2023 - dl.acm.org
Instead of mining coherent topics from a given text corpus in a completely unsupervised
manner, seed-guided topic discovery methods leverage user-provided seed words to extract …

Weakly supervised multi-label classification of full-text scientific papers

Y Zhang, B **, X Chen, Y Shen, Y Zhang… - Proceedings of the 29th …, 2023 - dl.acm.org
Instead of relying on human-annotated training samples to build a classifier, weakly
supervised scientific paper classification aims to classify papers only using category …

PIEClass: Weakly-supervised text classification with prompting and noise-robust iterative ensemble training

Y Zhang, M Jiang, Y Meng, Y Zhang, J Han - arxiv preprint arxiv …, 2023 - arxiv.org
Weakly-supervised text classification trains a classifier using the label name of each target
class as the only supervision, which largely reduces human annotation efforts. Most existing …

Cl-wstc: Continual learning for weakly supervised text classification on the internet

M Li, J Zhu, X Yang, Y Yang, Q Gao… - Proceedings of the ACM …, 2023 - dl.acm.org
Continual text classification is an important research direction in Web mining. Existing works
are limited to supervised approaches relying on abundant labeled data, but in the open and …

Seed-guided topic discovery with out-of-vocabulary seeds

Y Zhang, Y Meng, X Wang, S Wang, J Han - arxiv preprint arxiv …, 2022 - arxiv.org
Discovering latent topics from text corpora has been studied for decades. Many existing
topic models adopt a fully unsupervised setting, and their discovered topics may not cater to …

Bridging Text Data and Graph Data: Towards Semantics and Structure-aware Knowledge Discovery

B **, Y Zhang, S Li, J Han - Proceedings of the 17th ACM International …, 2024 - dl.acm.org
Graphs and texts are two key modalities in data mining. In many cases, the data presents a
mixture of the two modalities and the information is often complementary: in e-commerce …