ADCG Explainer: Synthetic Data Privacy
A new form of technology has hit the data governance market—synthetic data generation.
Adweek states synthetic data is used “to retain the statistical and behavioral aspects of real data sets without compromising the privacy of those individuals from which the data was collected.” Likewise, this data can replicate that “real-world data” that “would otherwise be impractical because of collection limitations or regulatory restrictions.”
In simpler terms, synthetic data analyzes datasets, then produces statistical models to guide decision making—all without humans seeing the actual data.
Benefits of Synthetic Data Generation:
Synthetic data is useful to organizations for a variety of reasons.
Personally Identifiable Information (PII)
Synthetic data “solves” for concerns surrounding PII because this data can be shared across business partners without concerns that an individual’s actual PII will be exposed. The synthetic data is based on “real data” obtained from an individual, but scrubbed of any original PII or sensitive data.
Gartner, the management consulting company who produced the Adweek article, predicted that the use of synthetic data would “reduce personal customer data collection in a way that avoids 70 percent of privacy violation sanctions.”
Marketing and Advertising
The use of AI and machine learning (ML), when paired with synthetic data generation techniques, is considered a “deepfake technology,” which is “a type of synthetic media that replaces existing videos or audio with synthetically generated images or audio.”
The use of deepfake technology has received some criticism according to Adweek, but it has already been successfully implemented in advertising and marketing campaigns and they anticipate that it “will become a more frequent fixture of advertising campaigns as marketers strive to keep pace with emerging tech development.” In fact, Gartner expects that by 2025, “30% of outbound marketing messages from large organizations will be synthetically generated, up from less than 2% in 2022.”
Product Testing and Development
Image and video related synthetically generated data is expected to “constitute more than 95% of data used for AI models by 2030.” One of the prevailing examples of that usage type is in “training ML models to develop products and features that can raise business value by improving product quality, reducing costs and potentially uncovering new products or services in the process.”
Growth in the Market:
Adweek anticipates “a wider embrace” of synthetic data generation “within the next two to five years,” and predicts that it will become the “norm” in the marketing industry. To read more about the projected growth of this technique, review The Synthetic Data Generation Market Research Report published by the Digital Journal, which provides an analysis of how synthetic data generation fits into the existing market landscape and provides a projection of the expected growth prospects before 2030—much of which can be attributed to the Covid-19 pandemic.
If your organization is interested in adopting this technique, you can begin analyzing the data in your information system and its usage in your business model to identify if this synthesizing process would be of use.
After you have determined if this technique is right for you, review synthetic data generation vendors and select those vendors that generate data sets that match your organization’s objectives and existing “real life” data sets.
As Digital Journal points out, this synthetic data generation market industry “is intensely competitive and fragmented because of the presence of several established players participating in various marketing strategies to expand their market share.” In this competitive market, the competition amongst vendors is “centered on price, quality, brand, product differentiation, and product portfolio.” Two emerging options to consider are Gretel and Genalog.
* * * * * * *
To read our news alerts discussing the CFPB’s data broker probe, and the latest on congress’s TikTok ban, click here.
This week’s breach report covers the following organizations: ChatGPT, Atlantic General Hospital, and Latitude Financial. Click here to find out more.
Jody Westby hosts our podcast, ADCG on Privacy & Cybersecurity, bringing together leaders in the privacy and cybersecurity arenas to discuss a wide range of issues ranging from the proposed federal and state regulations to best practices and standards for compliance. Episodes can be enjoyed on many platforms including Spotify and Apple Podcasts. Don’t forget to subscribe!
Our most recently released episodes:
89 | Quantum Technologies: What is Possible, Where We Are Headed & Policy Issues to Consider (with guest Chris Jay Hoofnagle)
88 | TikTok: A Path for Election Interference and Open Source Intelligence? (with guests Berit Anderson, and Evan Anderson)
87 | Artificial Intelligence & Chatbots…Helpful or Harmful? (with guest Heather West)
To browse our previously published articles and news alerts, please visit our website, and don’t forget to subscribe to receive free weekly Data and Cyber Governance news and Breach Reports directly to your email.