Prompt public large language models to synthesize data for private on-device applications

S Wu, Z Xu, Y Zhang, Y Zhang, D Ramage - arxiv preprint arxiv …, 2024 - arxiv.org
Pre-training on public data is an effective method to improve the performance for federated
learning (FL) with differential privacy (DP). This paper investigates how large language …

FedSpaLLM: Federated pruning of large language models

G Bai, Y Li, Z Li, L Zhao, K Kim - arxiv preprint arxiv:2410.14852, 2024 - arxiv.org
Large Language Models (LLMs) achieve state-of-the-art performance but are challenging to
deploy due to their high computational and storage demands. Pruning can reduce model …

Contrastive Private Data Synthesis via Weighted Multi-PLM Fusion

T Zou, Y Liu, P Li, Y **ong, J Zhang, J Liu, X Ye… - arxiv preprint arxiv …, 2025 - arxiv.org
Substantial quantity and high quality are the golden rules of making a good training dataset
with sample privacy protection equally important. Generating synthetic samples that …

Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning

D Jhunjhunwala, P Sharma, Z Xu, G Joshi - arxiv preprint arxiv …, 2025 - arxiv.org
Initializing with pre-trained models when learning on downstream tasks is becoming
standard practice in machine learning. Several recent works explore the benefits of pre …

Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model

Z Lin, T Baltrusaitis, S Yekhanin - arxiv preprint arxiv:2502.05505, 2025 - arxiv.org
Differentially private (DP) synthetic data, which closely resembles the original private data
while maintaining strong privacy guarantees, has become a key tool for unlocking the value …

LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples

H Wu, D Klabjan - arxiv preprint arxiv:2410.19114, 2024 - arxiv.org
Federated Learning (FL) is a collaborative, privacy-preserving machine learning framework
that enables multiple participants to train a single global model. However, the recent advent …

Is API Access to LLMs Useful for Generating Private Synthetic Tabular Data?

M Swanberg, R McKenna, E Roth, A Cheu… - arxiv preprint arxiv …, 2025 - arxiv.org
Differentially private (DP) synthetic data is a versatile tool for enabling the analysis of private
data. Recent advancements in large language models (LLMs) have inspired a number of …

Strategies for Learning From Non-Ideal Sources of Data

C Hou - 2024 - search.proquest.com
Abstract Machine learning (ML) generally performs well when there is unfettered access to
large quantities of clean, relevant data. However, harnessing the benefits of large, clean …

Effectively learning from data and generating data in differentially private machine learning

X Tang - 2024 - search.proquest.com
Abstract Machine learning models are susceptible to a range of attacks that exploit data
leakage from trained models. Differential Privacy (DP) is the gold standard for quantifying …