Topicgpt: A prompt-based topic modeling framework

CM Pham, A Hoyle, S Sun, P Resnik, M Iyyer - arxiv preprint arxiv …, 2023 - arxiv.org
Topic modeling is a well-established technique for exploring text corpora. Conventional
topic models (eg, LDA) represent topics as bags of words that often require" reading the tea …

Text classification using label names only: A language model self-training approach

Y Meng, Y Zhang, J Huang, C **ong, H Ji… - arxiv preprint arxiv …, 2020 - arxiv.org
Current text classification methods typically require a good number of human-labeled
documents as training data, which can be costly and difficult to obtain in real applications …

Effective neural topic modeling with embedding clustering regularization

X Wu, X Dong, TT Nguyen… - … Conference on Machine …, 2023 - proceedings.mlr.press
Topic models have been prevalent for decades with various applications. However, existing
topic models commonly suffer from the notorious topic collapsing: discovered topics …

Topic discovery via latent space clustering of pretrained language model representations

Y Meng, Y Zhang, J Huang, Y Zhang… - Proceedings of the ACM …, 2022 - dl.acm.org
Topic models have been the prominent tools for automatic topic discovery from text corpora.
Despite their effectiveness, topic models suffer from several limitations including the inability …

X-class: Text classification with extremely weak supervision

Z Wang, D Mekala, J Shang - arxiv preprint arxiv:2010.12794, 2020 - arxiv.org
In this paper, we explore text classification with extremely weak supervision, ie, only relying
on the surface text of class names. This is a more challenging setting than the seed-driven …

Goal-driven explainable clustering via language descriptions

Z Wang, J Shang, R Zhong - arxiv preprint arxiv:2305.13749, 2023 - arxiv.org
Unsupervised clustering is widely used to explore large corpora, but existing formulations
neither consider the users' goals nor explain clusters' meanings. We propose a new task …

Hierarchical topic mining via joint spherical tree and text embedding

Y Meng, Y Zhang, J Huang, Y Zhang, C Zhang… - Proceedings of the 26th …, 2020 - dl.acm.org
Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since
topic correlations are ubiquitous in massive text corpora. To account for potential …

Neighborhood-regularized self-training for learning with few labels

R Xu, Y Yu, H Cui, X Kan, Y Zhu, J Ho… - Proceedings of the …, 2023 - ojs.aaai.org
Training deep neural networks (DNNs) with limited supervision has been a popular research
topic as it can significantly alleviate the annotation burden. Self-training has been …

Comprehensive named entity recognition on cord-19 with distant or weak supervision

X Wang, X Song, B Li, Y Guan, J Han - arxiv preprint arxiv:2003.12218, 2020 - arxiv.org
We created this CORD-NER dataset with comprehensive named entity recognition (NER) on
the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020-03-13). This …

Beyond prompting: Making pre-trained language models better zero-shot learners by clustering representations

Y Fei, P Nie, Z Meng, R Wattenhofer… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent work has demonstrated that pre-trained language models (PLMs) are zero-shot
learners. However, most existing zero-shot methods involve heavy human engineering or …