ChatGPT-maker OpenAI accused of string of data protection breaches in GDPR complaint filed by privacy researcher

Questions about ChatGPT-maker OpenAI’s ability to comply with European privacy rules are in the frame again after a detailed complaint was filed with the Polish data protection authority yesterday.

The complaint, which TechCrunch has reviewed, alleges the US based AI giant is in breach of the bloc’s General Data Protection Regulation (GDPR) — across a sweep of dimensions: Lawful basis, transparency, fairness, data access rights, and privacy by design are all areas it argues OpenAI is infringing EU privacy rules. (Aka, Articles 5(1)(a), 12, 15, 16 and 25(1) of the GDPR).

Indeed, the complaint frames the novel generative AI technology and its maker’s approach to developing and operating the viral tool as essentially a systematic breach of the pan-EU regime. Another suggestion, therefore, is that OpenAI has overlooked another requirement in the GDPR to undertake prior consultation with regulators (Article 36) — since, if it had conducted a proactive assessment which identified high risks to people’s rights unless mitigating measures were applied it should have given pause for thought. Yet OpenAI apparently rolled ahead and launched ChatGPT in Europe without engaging with local regulators which could have ensured it avoided falling foul of the bloc’s privacy rulebook.

This is not the first GDPR concern lobbed in ChatGPT’s direction, of course. Italy’s privacy watchdog, the Garante, generated headlines earlier this year after it ordered OpenAI to stop processing data locally — directing the US-based company to tackle a preliminary list of problems it identified in areas including lawful basis, information disclosures, user controls and child safety.

ChatGPT was able to resume offering a service in Italy fairly quickly after it tweaked its presentation. But the Italian DPA’s investigation continues and it remains to be seen what compliance conclusions may emerge once that assessment has been completed. Other EU DPAs are also probing ChatGPT. While, in April, the bloc’s data protection authorities formed a task force to consider at how they should approach regulating the fast-developing tech.

That effort is ongoing — and it’s by no means certain a harmonized approach to oversight of ChatGPT and other AI chatbots will emerge — but, whatever happens there, the GDPR is still law and still in force. So anyone in the EU who feels their rights are being trampled by Big AI grabbing their data for training models that may spit out falsities about them can raise concerns with their local DPA and press for regulators to investigate, as is happening here.

OpenAI is not main established in any EU Member State for the purpose of GDPR oversight, which means it remains exposed to regulatory risk in this area across the bloc. So could face outreach from DPAs acting on complaints from individuals anywhere in the bloc.

Confirmed violations of the GDPR, meanwhile, can attract penalties as high as 4% of global annual turnover. DPAs’ corrective orders may also end up reworking how technologies function if they wish to continue operating inside the bloc.

Complaint of unlawful processing for AI training

The 17-page complaint filed yesterday with the Polish DPA is the work of Lukasz Olejnik, a security and privacy researcher, who is being represented for the complaint by Warsaw-based law firm, GP Partners.

Olejnik tells TechCrunch he became concerned after he used ChatGPT to generate a biography of himself and found it produced a text that contained some errors. He sought to contact OpenAI, towards the end of March, to point out the errors and ask for the inaccurate information about him to be corrected. He also asked it to provide him with a bundle of information that the GDPR empowers individuals to get from entities processing their data when the information has been obtained from somewhere other than themselves, as was the case here.

Per the complaint, a series of email exchanges took place between Olejnik and OpenAI between March and June of this year. And while OpenAI responded by providing some information in response to the Subject Access Request (SAR) Olejnik’s complaint argues it failed to produce all the information it must under the law — including, notably, omitting information about its processing of personal data for AI model training.

Under the GDPR, for personal data processing to be lawful the data controller needs a valid legal basis — which must be transparently communicated. So obfuscation is not a good compliance strategy. Also indeed because the regulation attaches the principle of fairness to the lawfulness of processing, which means anyone playing tricks to try to conceal the true extent of personal data processing is going to fall foul of the law too.

Olejnik’s complaint therefore asserts OpenAI breached Article 5(1)(a). Or, more simply, he argues the company processed his data “unlawfully, unfairly, and in a non-transparent manner”. “From the facts of the case, it appears that OpenAI systemically ignores the provisions of the GDPR regarding the processing of data for the purposes of training models within ChatGPT, a result of which, among other things, was that Mr. Łukasz Olejnik was not properly informed about the processing of his personal data,” the complaint notes.

It also accuses OpenAI of acting in an “untrustworthy, dishonest, and perhaps unconscientious manner” by failing to be able to comprehensively detail how it has processed people’s data.

“Although OpenAI indicates that the data used to train the [AI] models includes personal data, OpenAI does not actually provide any information about the processing operations involving this data. OpenAI thus violates a fundamental element of the right under Article 15 GDPR, i.e., the obligation to confirm that personal data is being processed,” runs another relevant chunk of the complaint (which has been translated into English from Polish using machine translation).

“Notably, OpenAI did not include the processing of personal data in connection with model training in the information on categories of personal data or categories of data recipients. Providing a copy of the data also did not include personal data processed for training language models. As it seems, the fact of processing personal data for model training OpenAI hides or at least camouflages intentionally. This is also apparent from OpenAI’s Privacy Policy, which omits in the substantive part the processes involved in processing personal data for training language models.

“OpenAI reports that it does not use so-called ‘training’ data to identify individuals or remember their information, and is working to reduce the amount of personal data processed in the ‘training’ dataset. Although these mechanisms positively affect the level of protection of personal data and comply with the principle of minimization (Article 5(1)(c) of the GDPR), their application does not change the fact that ‘training’ data are processed and include personal data. The provisions of GDPR apply to the processing operations of such data, including the obligation to grant the data subject access to the data and provide the information indicated in Article 15(1) of GDPR.”

It’s a matter of record that OpenAI did not ask individuals whose personal data it may have processed as training data when it was developing its AI chatbot for their permission to use their information for that — nor did it inform the likely millions (or even billions) of people whose information it ingested in order to develop a commercial generative AI tool — which likely explains its lack of transparency when asked to produce information about this aspect of its data processing operations via Olejnik’s SAR.

However, as noted above, the GDPR requires not only a lawful basis for processing people’s data but transparency and fairness vis-a-vis any such operations. So OpenAI appears to have got itself into a triple bind here. Although it remains to be seen how EU regulators will act on such complaints as they weigh how to respond to generative AI chatbots.

Right to correct personal data ignored

Another aspect of Olejnik’s beef with OpenAI fixes on errors ChatGPT generated about him when asked to produce a biography — and its apparent inability to rectify these inaccuracies when asked. Instead of correcting falsehoods its tool generated about him, he says OpenAI initially responded to his ask by blocking requests made to ChatGPT that referenced him — something he had not asked for.

Subsequently it told him it could not correct the errors. Yet the GDPR provides individuals with a right to rectification of their personal data.

“In the case of OpenAI and the processing of data to train models, this principle [rectification of personal data] is completely ignored in practice,” the complaint asserts. “This is evidenced by OpenAI’s response to Mr. Łukasz Olejnik’s request, according to which OpenAI was unable to correct the processed data. OpenAI’s systemic inability to correct data is assumed by OpenAI as part of ChatGPT’s operating model.”

Discussing disclosures related to this aspect of its operation contained in OpenAI’s privacy policy, the complaint goes on to argue: “Given the general and vague description of ChatGPT’s data validity mechanisms, it is highly likely that the inability to correct data is a systemic phenomenon in OpenAI’s data processing, and not just in limited cases.”

It further suggests there may be “reasonable doubts about the overall compliance with data protection regulations of a tool, an essential element of which is the systemic inaccuracy of the processed data”, adding: “These doubts are reinforced by the scale of ChatGPT’s processed data and the scale of potential recipients of personal data, which affect the risks to rights and freedoms associated with personal data inaccuracy.”

The complaint goes on to argue OpenAI “should develop and implement a data rectification mechanism based on an appropriate filter/module that would verify and correct content generated by ChatGPT (e.g., based on a database of corrected results)”, suggesting: “It is reasonable in the context of the scope of the obligation to ensure data accuracy to expect OpenAI to correct at least data reported or flagged by users as incorrect.”

“We believe that it is possible for OpenAI to develop adequate and GDPR-compliant mechanisms for correcting inaccurate data (it is already possible to block the generation of certain content as a result of a blockade imposed by OpenAI),” it adds. “However, if, in OpenAI’s opinion, it is not possible to develop such mechanisms — it would be necessary to consult the issue with the relevant supervisory authorities, including, for example, through the prior consultation procedure described in Article 36 of GDPR.”

Data protection incompatibility by design?

The complaint also seeks to spotlight what it views as a total violation of the GDPR’s principle of data protection by design and default.

“The way the ChatGPT tool was designed, taking into account also the violations described [earlier] in the complaint (in particular, the inability to exercise the right to rectify data, the omission of data processing operations for training GPT models) — contradicts all the indicated assumptions of the principle of data protection by design,” it argues. “In practice, in the case of data processing by OpenAI, there is testing of the ChatGPT tool using personal data, not in the design phase, but in the production environment (i.e., after the tool is made available to users).

“OpenAI seems to accept that the ChatGPT tool model that has been developed is simply incompatible with the provisions of GDPR, and it agrees to this state of affairs. This shows a complete disregard for the goals behind the principle of data protection by design.”

We’ve asked OpenAI to respond to the complaint’s claims that its AI chatbot violates the GDPR and also to confirm whether or not it produced a data protection impact assessment prior to launching ChatGPT.

Additionally, we’ve asked it to explain why it did not seek prior consultation with EU regulators to get help on how to develop such a high risk technology in a way that could have mitigated GDPR risks. At the time of writing it had not responded to our questions but we’ll update this report if we get a response.

We’ve also reached out to the Polish DPA about the complaint. However EU DPAs don’t often have much to say on open complaints.

Discussing their expectations for the complaint, Olejnik’s lawyer, Maciej Gawronski, suggests the length of time it could take the Polish regulator, the UODO, to investigate could be “anything from six months to two years”.

“Provided UODO confirms violation of the GDPR we would expect UODO to primarily order OpenAI to exercise Mr Olejnik’s rights,” he told us. “In addition, as we argue that some of OpenAI’s violations may be systemic, we hope the DPA will investigate the processing thoroughly and, if justified, order OpenAI to act in compliance with the GDPR so that data processing operations within ChatGPT are lawful in a more universal perspective.”

Gawronski also takes the view that OpenAI has failed to apply Article 36 of the GDPR — since it did not engage in a process of prior consultation with the UODO or any other European DPA before launching ChatGPT — adding: “We would expect UODO to force OpenAI into engaging into a similar process now.”

In another step, the complaint urges the Polish regulator to require OpenAI to submit a data protection impact assessment (DPIA) with details of its processing of personal data for purposes related to ChatGPT — describing this document, which is a standard feature of data protection compliance in Europe, as an “important element” for assessing whether the tool is compliant with the GDPR.

For his part, Olejnik says his hope in bringing the complaint against OpenAI and ChatGPT is that he will be able to properly exercise all the GDPR rights he has found himself unable to so far.

“During this journey I felt kind of like Josef K, in kafka’s The Trial,” he told us. “Fortunately, in Europe there’s a system in place to avoid such a feeling. I trust that the GDPR process does work!”

Natasha Lomas

Source link