illustrative-example Differentially Private Synthetic Data via Foundation Model APIs 2: Text

1University of Illinois at Urbana-Champaign, 2Microsoft Research, 3Sun Yat-sen University, 4University of Chicago

ICML 2024 (Spotlight)

Overview

Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalable solution. However, existing methods necessitate DP finetuning of LLMs on private data to generate DP synthetic data. This approach is not viable for proprietary LLMs (e.g., GPT-3.5) and also demands considerable computational resources for open-source LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models.

In this work, we propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text. We use API access to an LLM and generate DP synthetic text without any model training. Results on three benchmark datasets demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines. This underscores the feasibility of relying solely on API access of LLMs to produce high-quality DP synthetic texts, thereby facilitating more accessible routes to privacy-preserving LLM applications.

illustrative-example

Augmented Private Evolution (Aug-PE)

We use two private & synthetic samples (reviews for the "restaurant" class) for illustration of Aug-PE.

  • Step 1 (RANDOM_API): we use prompts to generate random samples from the LLM.
  • Step 2: we iteratively go through steps 2.1-2.3 to refine the synthetic samples towards the private samples.
    • Step 2.1: each private sample votes for their closest synthetic sample in the embedding space induced by embedding model. "A great spot for pizza" gets 2 votes, and the other sample gets 0 votes. We then add Gaussian noise to the votes to ensure DP. This gives us the DP Nearest Neighbor Histogram (DP_NN_HISTOGRAM).
    • Step 2.2: we resample the generated texts according to the histogram. We assume that only "A great spot for pizza" remains.
    • Step 2.3 (VARIATION_API): we use prompts to ask the LLM to generate new similar samples, which will be used in the initial synthetic samples for the next iteration.

The prompts are simplified for illustration; see our paper for the complete prompts.

illustrative-example

Compared to image generation as in original PE paper, text generation introduces unique challenges: 1) the core components of PE such as the embedding model, RANDOM_API, and VARIATION_API, require domain-specific designs for text; 2) unlike images which have a fixed dimensionality, the length of text can vary; 3) the original PE algorithm yields unsatisfactory text quality. Therefore, in our proposed augmented algorithm, Aug-PE, we explore the design choices for each component, and new algorithmic techniques to increase the diversity and quality of text generation.

Performance of Aug-PE

We compare Aug-PE to two SOTA methods involving DP finetuning:

  • DP-FT-Generator (Yue et al, 2023): finetuning generator (e.g., GPT-2) with DP-SGD (note that we cannot DP finetune closed-source GPT-3.5) and using synthetic texts to finetune downstream model with non-private SGD.
  • DP-FT-Downstream (Li et al, 2022, Yu et al, 2022): finetuning downstream model on real data with DP-SGD. This baseline is not a competitor to our method, since our goal is to generate DP synthetic data and not merely train a downstream model.

With the same generator, Aug-PE is on par with DP finetuning in some cases

We evaluate the downstream model accuracy of Aug-PE and two baselines along different data generators. The highest accuracy across all methods (obtained by Aug-PE) is bolded (underlined).

  • Compared to DP-FT-Generator, in some cases, downstream accuracy of Aug-PE is higher (↑) under the same size of GPT2-series data generator.
  • Compared to traditional method DP-FT-Downstream, Aug-PE can also obtain higher accuracy under DP.

Aug-PE is compatible with closed-source LLMs for improved utility, where DP finetuning is infeasible

Leveraging the inherent knowledge within stronger LLM, GPT-3.5, Aug-PE can achieve higher accuracy, especially on challenging datasets OpenReview and PubMed, outperforming DP-FT-Generator by a notable margin.

Many powerful LLMs such as GPT-4, Claude, and Bard are only accessible through inference APIs. DP finetuning them is not feasible. Although standard finetuning APIs are provided for some of the models, DP finetuning requires a special implementation (i.e., DP-SGD) and no model provides this custom API to date.

illustrative-example

Aug-PE is compatible with open-source LLMs, where DP finetuning is hard to implement

Using powerful open-source LLMs, such as Mixture-of-Expert Mixtral-8x7B-v0.1, as data generator, leads to improved downstream accuracy for Aug-PE on three datasets.

Finetuning those open-source LLMs with DP is resource-intensive and non-trivial to implement due to the need of calculating per-sample gradient in DP-SGD. The state-of-the-art DP synthetic text approaches are unfortunately still based on GPT-2 (Yue et al, 2023).

illustrative-example

Aug-PE is more efficient than DP finetuning generator

With inference API access, Aug-PE is more efficient than DP-FT-Generator that requires DP-SGD finetuning on Yelp for generating 100k synthetic samples (ϵ = 1). The running time of Aug-PE mainly depends on the number of API calls, which is associated with the hyperparameter L.

illustrative-example

Aug-PE produces favorable text length distributions

The text length distribution of synthetic samples generated from GPT-3.5 through Aug-PE more closely matches the original Yelp data distribution across iterations, due to our adaptive sequence length mechanism.

illustrative-example

Proporties of Aug-PE

The high-quality synthetic text from Aug-PE is better utilized by larger downstream models

The next word prediction accuracy of the downstream model increases when using larger downstream models for PubMed synthetic texts. Under both ϵ = 1, ∞, the smallest model BERT-Tiny favors the synthetic texts from DP-FT-Generator GPT-2-Large, while larger models such as LLaMA-2 favor synthetic text from Aug-PE GPT-3.5.

This observation underscores the importance of choosing downstream models of a suitable size; employing overly small models could under-estimate the quality of synthetic texts produced by Aug-PE with GPT-3.5. We hypothesize that this is because (1) GPT-3.5 generated texts might already be of higher quality in terms of vocabulary, syntax, semantic coherence, etc., compared to generated texts from finetuned GPT-2-Large; and (2) larger downstream LMs like LLaMA-2 can better understand and utilize the nuances in synthetic texts for improved performance than BERT-Tiny.

illustrative-example

More powerful embedding model leads to higher utility for Aug-PE generated texts

For example, larger embedding models such as sentence-t5-xl can more accurately capture the nuances of texts in the embedding space, leading to higher utility for GPT-2 generated texts.

illustrative-example

Aug-PE outperforms PE

We apply the same API designs and models to the original PE algorithm (Lin et al, 2024) to support text generation. Aug-PE achieves notable improvement over PE for GPT-2 on all datasets, e.g., +22.6% on Yelp rating classification. It shows that the new algorithmic techniques introduced in Aug-PE is effective for diverse and high-quality text generation.

illustrative-example

Aug-PE effectively uses private data to guide synthetic data selection

The initial samples (RANDOM_API) or their variants (RANDOM_API + VARIATION_API) exhibit limited utility. However, the synthetic text quality improves notably after just one iteration of Aug-PE when guided by private data, and this improvement continues to amplify with T iterations.

illustrative-example
See more results in our paper.

Here are some examples of Aug-PE generated samples with GPT-3.5 under DP privacy budget ϵ = 1. Each row represents an independent sample drawn from the synthetic dataset obtained during the last iteration of Aug-PE.

Text Label
Example Text 1
Example Text 2
Example Text 3

BibTeX

@article{xie2024differentially,
        title={Differentially Private Synthetic Data via Foundation Model APIs 2: Text}, 
        author={Chulin Xie and Zinan Lin and Arturs Backurs and Sivakanth Gopi and Da Yu and Huseyin A Inan and Harsha Nori and Haotian Jiang and Huishuai Zhang and Yin Tat Lee and Bo Li and Sergey Yekhanin},
        journal={ICML},
        year={2024}
    }