Classic privacy-enhancing techniques do not provide real privacy and result in a loss of information
With classic anonymization, we imply all methodologies where one manipulates or distorts an original dataset to hinder tracing back individuals. Typical examples of classic anonymization that we see in practice are generalization, suppression or wiping, pseudonymization and row and column shuffling.
Manipulating a dataset with classic anonymization techniques results in two key disadvantages:
1. Distorting / manipulating a dataset results in decreased data quality (i.e. data utility) or even a destroyed / unusable dataset.
2. Privacy risk will be reduced, but will always be present because it is still the original dataset containing 1 to 1 relations. Even a manipulated dataset will always have some privacy risk. Nowadays, individuals can easily be traced back by having access to only a few attributes.
This introduces the trade-off between data utility and privacy protection, where classic anonymization techniques always offer a suboptimal combination of both.
Synthetic data: a game changer?
Synthetic data by Syntho fills the gaps where classic anonymization techniques fall short, by maximizing both data utility and privacy protection.
USP 1: privacy protection
The concept of privacy is fully imbedded and a consequence of our applied technology. We generate an entirely new dataset of fresh data records. Information to identify real individuals is simply not present in a synthetic dataset and a 1:1 relation with the original data that we typically see with classic anonymization does not exist. For example: the record related to Olivia (26) does not exist in the synthetic dataset. Even a manipulated version of that specific record does not exist in the synthetic dataset.
USP 2: data utility
We generate realistic synthetic data that can be used as if it is real data. Outcomes of data analysis on synthetic data will be (nearly) identical to analysis results of the original data.
For demonstrating this, we provide a detailed quality report (70+ pages document) and offer joint evaluation for every synthetic dataset that we generate.
Ultimate test: even machine learning models can barely distinguish original data from our synthetic data. These outcomes are not merely interesting elements to show in our quality report, these are also the key performance measures that Syntho uses for optimization of the Syntho Engine.
More on this topic can be found here: https://www.syntho.ai/synthetic-data-preserves-statistical-properties/
In conclusion: Synthetic data by Syntho is the preferred solution when compromising on both data quality & privacy protection is not desired.