Issued 13 April, 2025
Implications of the DE-COP membership inference attack Method
A Method to Identify Potential "Access Violation" of Copyrighted Material by AI Models - or is it a fluke?
In a recently published working paper the Social Science Research Council's AI Disclosures Project, presents a method to detect whether or not an AI Model was trained on a specific data-set.
The data-set was constituted of 34 public and non-public books owned by O'Reilly Media with the latter books being available behind an online (paywall).
The Method is a two staged process:
-
to determine if a specific data point (e.g., an image or text) was used to train an AI Model.
-
a way to measure how well a membership inference attack can distinguish between two unrelated input data
The researchers applied this method to OpenAI's GPT-4o and GPT-3.5 Turbo Models, a 'big' and a "small' model to probe whether books owned by O'Reilly's media were used for training these Models.
We consider the legal implications of such a testing method, its potential as a means of establishing "access violations" for copyrighted material and its impact to the AI industry as a whole.
The Method
The purpose of the investigation was to determine whether OpenAI's GPT-4o and GPT-3.5 Turbo Models were trained, more specifically, on non-public books (i.e. behind a paywall) without paying for the content.
The first step involved the DE-COP (Decision-based Confidence Optimization) membership inference attack which is a method to determine if a specific data point (e.g., an image or text) was used to train an AI Model.
Here is a simplistic version of how the attack works:
- First, two sets of input data are organized: Data that existed before the training of the Model (a Member, and data that existed after the training occurred.
- The two types of Input data is used, separately, to query the AI Model
- The confidence levels of the Model in its response (i.e., it's prediction probabilities) to the input data are scored and recorded.
- The scores are between 0 and 1, with 5 indicating that the model is "not sure" - towards 1 indicative of the Model's confidence
- If the Model is overconfident (i.e., assigns high probabilities to the input class), the presumption is that the input was likely in the training data for the Model.
- If the Model in its response is less confident, -scoring towards 0 - it means that the model was not trained on the data.
- It gathers the Predictions having checked the model’s confidence levels for each query and sorts scores from highest to lowest confidence - resulting in a Ranked by Confidence chart. The chart lists each query, labeled according to levels (i.e., 0 -1) and grouping (i.e, member or non-member).
The second step involved the AUROC Scores (Area Under the Receiver Operating Characteristic) method which is a way to measure how well a membership inference attack (like DE-COP) can distinguish between member input data and non-member input data.
Here is a simplistic version of how it works:
- It picks up the Ranked by Confidence output of DE-COP and calculates confidence threshold as a percentage of members that DE-COP correctly flagged (i.e., the True Positive Rate) and the percentage of non-members that DE-COP incorrectly flagged (i.e., the False Positive Rate).
- It scores, on a range from 0 to 1, where 0.5 represents a random chance of the 'inference attacker' succeeding to discriminate between the two data-sets; values closer to 1 indicating a strong ability to accurately discriminate between the two.
- A perfect attack (AUROC=1.0) would have all members ranked above non-members. For instance, 0.8 means DE-COP is 80% accurate at separating training vs. non-training input data, much better than random (0.5).
As mentioned above the research was conducted on a dataset containing 34 copyrighted O’Reilly Media books, containing both non-public (behind a paywall) and public (freely available) text within each book.
The books were split into a total of 13,962 paragraphs used to calculate the initial mean DE-COP Attacker score, one for each book.
A single AUROC Score was then calculated across all books for OpenAI models (i.e., GPT 4o, GPT-3.5 Turbo).
The report concludes that, in using this method OpenAI's model-GPT-4o was "over-confident" with both O'Reilly's publicly available input data and 'non-public' (pay-walled) input data when tested - not so for OpenAI's GPT-3.5 Turbo.
The Story Version
One of the critical aspects of creating a machine model that is capable of learning is to provide it with a very large 'training data' — which may include public and non-public data.
Let’s say the training data involves thousands of photos of cats and dogs.
During training, a Machine Learning (ML) algorithm learns the “characteristics” of this data in order to be able to distinguish between cat images and dog images, in general.
Now, let's say PetStore suspects you (as model owner) used their copyrighted photos without permission.
Having published your model they can’t see your training data (it’s locked away), so they hire researchers to test your model using a clever trick (the DE-COP attack).
They trick your model by showing it specific photos from their book alongside similar photos from someone else's book and ask, “Which one’s real?” If your model consistently picks the PetStore photos, it suggests it was trained on them.
The researchers, in the OpenAI/O'Reilly situation, used the DE-COP attack and showed that OpenAI's GPT-4o's outputs representations align closely with O'Reilly's copyrighted content which is a sign that the latter trained their Model with this information.
The conclusion reached through the use of the DE-COP/AUROC methods in O'Reilly question has serious implications for the AI sector.
If further testing of this method proves it to be a legitimate test for gauging "access violations", then publishers, artists, and governments will utilize it to audit AI systems, in every jurisdiction across the globe.
At that point a DE-COP Attack becomes an instrument for checking illegitimate use of copyrighted material. At that point, the DE-COP attack method will not just be a research tool—it will be a litigation tool. If pursued this case could be the "Napster moment" for AI.
The Tension
It is generally argued that 'Inference attacks' on AI models, of any kind, may compromise the model owner’s intellectual property. What constitutes that primary IP is
- the model architecture and weights which are proprietary design and learned parameters, often guarded as trade secrets or protected under patent law
- training data selection and curation which are the specific choices and pre-processing of datasets that can be considered a trade secret, and
- the output generation process which refers to the manner in which the model processes inputs to produce outputs; if it can be shown that the process involves proprietary techniques.
I bring this up to emphasize the tension that the researched report may have steered more so when one recalls that many legal systems, in the democratic world, prohibit evidence obtained through illegal means from being admitted in court.
This principle is commonly known as the "exclusionary rule" or the doctrine of "fruit of the poisonous tree." Therefore, one would here declare a perceived tension in the matter - if not raised here for argument sake.
However, technical experts claim that DE-COP doesn’t extract code or the exact patterns an AI model learned. It just asks questions like, “Is this PetShop’ cat photo the real one?” If the model nails the answers, it hints that it saw those photos during before (in training).
This reveals something about the training data (i.e., use of PetShops’ photos”), but it doesn’t give away the model’s inner workings or it's full dataset.
Thanks for reading! For exclusive insights and to help keep this blog thriving, join our Patreon >/a> community today!