ICO’s Guidance on Privacy Enhancing Technologies (PETs)

In September 2022 the Information Commissioner’s Office (ICO) published draft guidance to assist organizations with implementing a ‘data protection by design and by default’ approach via techniques like data anonymization and pseudonymization, as well as through the use of privacy enhancing technologies (PETs).

PETs are techniques, software, and hardware solutions that permit data holders to do more with less data. These technologies are useful for utilizing consumer data considered too sensitive to share in a secure and compliant manner. According to ICO’s draft guidance, PETs “help you to comply with the data minimization principle by ensuring you only process the data you need for your purposes, and provide an appropriate level of security for your processing.”

There are several categories of PETs that an organization can elect to use, but the draft guidance focused on the following: homomorphic encryption (HE). secure multiparty computation (SMPC), private set intersection (PSI), federated learning, trusted execution environments (TEE), zero-knowledge proofs (ZKP), differential privacy, and synthetic data.

Each category of PET has specific purposes and benefits to the organizational user.

Homomorphic Encryption

HE allows the user to perform computations on encrypted data—meaning it can be processed without ever being decrypted. Through HE, a data controller can utilize a key-generation algorithm to generate a public key and an evaluation key, which are used in conjunction by outside entities to perform computations and generate output. HE also generates a private key, which is retained by the controller so that they alone may decrypt the output and obtain the desired results.

HE means an organization can minimize the risk of a damaging data breach by never decrypting data. And, HE guarantees more accurate computations because the data doesn’t need to be–and can’t be— altered in order to perform computations with it.

There are several variants of HE that can be applied by your organization, with each one providing varied effectiveness ranges, associated risks and computational overhead that can be incurred.

Secure Multiparty Computation

SMPC is a set of rules for transmitting data between multiple computers—so that multiple parties can “jointly perform processing on their combined data, without needing to share all of their data with each of the other parties.” This process is made possible by a cryptographic technique called “secret sharing” which can fragment a data set and distribute secure access to that fragmented data once it is combined with the work of another authorized person.

SMPC substantially minimizes the risk of personal data breaches but is not accessible to all parties as the process is still an “evolving and maturing concept” that currently “requires technological expertise and resources.” Additionally, because the data inputs of the various participating parties remain secret during the computation process, there is an increased risk of inaccuracy in outputs.

Private Set Intersection

PSI is a type of SMPC which permits two parties, each possessing their own data set, to find an “intersection” of their two datasets where the elements of the two are common, without requiring the datasets to be revealed or shared.

This process provides data minimization and security as no data is shared. However, the same inaccuracy risks presented by SMPC are present with PSI. PSI presents additional challenges as the data processors must properly define the intersection size and parameters for their computational purposes and desired objective.

Federated Learning

FL is a “technique which allows multiple different parties to train AI models on their own data (‘local’ models)[]” by combining patterns that the models identify into a “single, more accurate ‘global’ model, without having to share any training data with each other.”

There are two approaches to federated learning, each presenting their own benefits and challenges. Regardless of the approach you select, FL presents several opportunities for data breach or leakage as the information is exposed to multiple parties and to the machine learning model. As such, the draft guidance encourages the implementation of other PETs with the FL to ensure that your organization is properly protected.

Trusted Execution Environments

TEE is a “secure area inside a computing device’s central processing unit (CPU)[]” that “allows code to be run, and data to be accessed, in a way that is isolated from the rest of the system.” These environments are made up of software and hardware components that are isolated from the rest of the operating system to ensure that the system cannot read the code generated in the TEE. Additionally, when applications are run in the TEE, the only accessible data is that owned by the application and any data generated or utilized is not accessible to external code.

Zero-Knowledge Proofs

ZKP is protocol where an individual (referred to as the “prover”) is able to prove to another party (referred to as the “verifier”) that they are in the possession of information that is unknown to the verifier (referred to as a “secret”).

Currently, ZKPs are used to confirm a person’s age, to demonstrate the ownership of an asset without requiring proof of acquisition of that asset, or supporting biometric authentication methods, such as facial recognition. In collecting this personal data, ZKPs utilize algorithms and functions that “provide a probable certainty as to whether the information is correct or not[,]” without requiring that this information actually be proven—which would expose the organization to increased data breach risk.

Differential Privacy

Differential privacy is a method of “measuring how much information the output of a computation reveals about an individual[,]” by injecting “noise” into the dataset. “Noise is a random alteration of data in a dataset so that values such as direct or indirect identifiers of individuals are harder to reveal.” The amount of noise that gets injected is set by the “epsilon,” or “privacy budget/parameter.”  The level of noise determines the level of “plausible deniability” that a particular individual’s personal data is included in a dataset.

By using differential privacy, your organization can populate somewhat anonymous outputs—so long as the proper amount of noise is injected to create uncertainty around which data input belongs to which consumer. However, the draft guidance encourages the use of differential privacy in the context of statistical analysis and broad trend generation, rather than to detect anomalies or detailed patterns within a dataset, as the noise can result in “poor utility” of a data set.

Synthetic Data

Synthetic data is “artificial” data that is generated by data synthesis algorithms, which replicate patterns and the statistical properties[,]” such as the characteristics and structure of real data. This process is useful when an organization does not want to—or cannot—share personal data but wants to convey overall statistics of a data set.

The greatest risk with the use of synthetic data is that—for the generated information to accurately portray the real data and be of use to your organization—the inputting party would need to closely mimic the real data. The similarity between the real data and the synthetic data can then allow a threat actor to re-identify the original data source.

The ICO is currently seeking feedback on this draft guidance until September 16, 2022.

* * * * * * *

For ADCG’s Breach Report and more news updates discussing: Federal Trade Commission (FTC) held a public comment hearing last week on its advanced notice of proposed rulemaking; Indonesia to Punish Data Breach with Jail Time; and Ohio, Michigan, and Pennsylvania Consider Privacy Laws, click here.

To browse through our previously published articles and news alerts, please visit our website, and don’t forget to subscribe to receive free weekly Data and Cyber Governance news and Breach Reports directly to your email.

We have two guests lined up for new podcast episodes. New episodes are generally released on Thursdays, here. They can be enjoyed on Spotify and Apple Podcasts. Don’t forget to subscribe!

Our most recently released episodes:

77 | Privacy & Cybersecurity Whistleblowers: A New Trend?

76 | Privacy Governance v. Cybersecurity Governance

75 | Cybersecurity and Cyber Insurance: Claims, Costs, and Chaos

Previous
Previous

Why Privacy Settings Can’t be Set to “Consent” by Default

Next
Next

News Alerts and Breach Report for Week of September 12, 2022