Does writing with language models reduce content diversity?

V Padmakumar, H He - arxiv preprint arxiv:2309.05196, 2023 - arxiv.org
Large language models (LLMs) have led to a surge in collaborative writing with model
assistance. As different users incorporate suggestions from the same model, there is a risk of …

Mauve: Measuring the gap between neural text and human text using divergence frontiers

K Pillutla, S Swayamdipta, R Zellers… - Advances in …, 2021 - proceedings.neurips.cc
As major progress is made in open-ended text generation, measuring how close machine-
generated text is to human language remains a critical open problem. We introduce Mauve …

Distillm: Towards streamlined distillation for large language models

J Ko, S Kim, T Chen, SY Yun - arxiv preprint arxiv:2402.03898, 2024 - arxiv.org
Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller
student model, reducing its inference cost and memory footprint while preserving model …

Mauve scores for generative models: Theory and practice

K Pillutla, L Liu, J Thickstun, S Welleck… - Journal of Machine …, 2023 - jmlr.org
Generative artificial intelligence has made significant strides, producing text
indistinguishable from human prose and remarkably photorealistic images. Automatically …

Rényicl: Contrastive representation learning with skew rényi divergence

K Lee, J Shin - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
Contrastive representation learning seeks to acquire useful representations by estimating
the shared information between multiple views of data. Here, the choice of data …

Synthesizrr: Generating diverse datasets with retrieval augmentation

A Divekar, G Durrett - arxiv preprint arxiv:2405.10040, 2024 - arxiv.org
It is often desirable to distill the capabilities of large language models (LLMs) into smaller
student models due to compute and memory constraints. One way to do this for classification …

On the usefulness of embeddings, clusters and strings for text generator evaluation

T Pimentel, C Meister, R Cotterell - arxiv preprint arxiv:2205.16001, 2022 - arxiv.org
A good automatic evaluation metric for language generation ideally correlates highly with
human judgements of text quality. Yet, there is a dearth of such metrics, which inhibits the …

A practical guide to sample-based statistical distances for evaluating generative models in science

S Bischoff, A Darcher, M Deistler, R Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
Generative models are invaluable in many fields of science because of their ability to
capture high-dimensional and complicated distributions, such as photo-realistic images …

Categorical Generative Model Evaluation via Synthetic Distribution Coarsening

F Regol, M Coates - International Conference on Artificial …, 2024 - proceedings.mlr.press
As we expect to see a rapid integration of generative models in our day to day lives, the
development of rigorous methods of evaluation and analysis for generative models has …

KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

W Wang, X Liang, R Ye, J Chai, S Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
The success of large language models (LLMs) facilitate many parties to fine-tune LLMs on
their own private data. However, this practice raises privacy concerns due to the …