Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification

S Hu, N Ding, H Wang, Z Liu, J Wang, J Li… - arxiv preprint arxiv …, 2021 - arxiv.org
Tuning pre-trained language models (PLMs) with task-specific prompts has been a
promising approach for text classification. Particularly, previous studies suggest that prompt …

[HTML][HTML] Self-training: A survey

MR Amini, V Feofanov, L Pauletto, L Hadjadj… - Neurocomputing, 2025 - Elsevier
Self-training methods have gained significant attention in recent years due to their
effectiveness in leveraging small labeled datasets and large unlabeled observations for …

Coco-lm: Correcting and contrasting text sequences for language model pretraining

Y Meng, C **ong, P Bajaj, P Bennett… - Advances in Neural …, 2021 - proceedings.neurips.cc
We present a self-supervised learning framework, COCO-LM, that pretrains Language
Models by COrrecting and COntrasting corrupted text sequences. Following ELECTRA-style …

Text classification using embeddings: a survey

LS da Costa, IL Oliveira, R Fileto - Knowledge and Information Systems, 2023 - Springer
Text classification results can be hindered when just the bag-of-words model is used for
representing features, because it ignores word order and senses, which can vary with the …

Harnessing artificial intelligence to combat online hate: Exploring the challenges and opportunities of large language models in hate speech detection

T Kumarage, A Bhattacharjee, J Garland - arxiv preprint arxiv:2403.08035, 2024 - arxiv.org
Large language models (LLMs) excel in many diverse applications beyond language
generation, eg, translation, summarization, and sentiment analysis. One intriguing …

Fine-tuning pre-trained language model with weak supervision: A contrastive-regularized self-training approach

Y Yu, S Zuo, H Jiang, W Ren, T Zhao… - arxiv preprint arxiv …, 2020 - arxiv.org
Fine-tuned pre-trained language models (LMs) have achieved enormous success in many
natural language processing (NLP) tasks, but they still require excessive labeled data in the …

Topic discovery via latent space clustering of pretrained language model representations

Y Meng, Y Zhang, J Huang, Y Zhang… - Proceedings of the ACM …, 2022 - dl.acm.org
Topic models have been the prominent tools for automatic topic discovery from text corpora.
Despite their effectiveness, topic models suffer from several limitations including the inability …

Decoupling knowledge from memorization: Retrieval-augmented prompt learning

X Chen, L Li, N Zhang, X Liang… - Advances in …, 2022 - proceedings.neurips.cc
Prompt learning approaches have made waves in natural language processing by inducing
better few-shot performance while they still follow a parametric-based learning paradigm; …

Distantly-supervised named entity recognition with noise-robust learning and language model augmented self-training

Y Meng, Y Zhang, J Huang, X Wang, Y Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
We study the problem of training named entity recognition (NER) models using only distantly-
labeled data, which can be automatically obtained by matching entity mentions in the raw …

Prboost: Prompt-based rule discovery and boosting for interactive weakly-supervised learning

R Zhang, Y Yu, P Shetty, L Song, C Zhang - arxiv preprint arxiv …, 2022 - arxiv.org
Weakly-supervised learning (WSL) has shown promising results in addressing label scarcity
on many NLP tasks, but manually designing a comprehensive, high-quality labeling rule set …