The modern digital economy has driven an unprecedented centralisation of data, creating vast, internet-connected databases that act as both irresistible targets for malicious actors and systemic points of failure.⁴ When these repositories contain Personally Identifiable Information (PII), their compromise can lead to immediate and devastating consequences, including identity theft, financial fraud, and personal endangerment.⁵ The very architecture of these centralised systems—storing millions of records in a single logical location—means that a single successful intrusion can yield a disproportionately massive reward for attackers, a reality proven by the relentless cadence of large-scale data breaches affecting corporations and government agencies alike.¹ The security measures protecting these "honeypots" must be flawless, yet they are pitted against a determined and ever-evolving threat landscape, making a breach not a matter of if, but when.⁶
Beyond the immediate risk of a direct breach, centralisation creates a more subtle but equally pernicious threat known as correlation risk. While organisations may diligently protect overtly sensitive PII such as passport numbers or financial details, they often collect a wide array of seemingly innocuous data points: location check-ins, purchase histories, website Browse habits, or even smartwatch heart rate data. Individually, these data points may appear anonymous. However, when aggregated within a single, vast database, they can be cross-referenced and correlated—by malicious insiders, external attackers who have gained access, or even by the data controller itself—to de-anonymise individuals with alarming accuracy.² A landmark study showed how supposedly anonymous Netflix movie rating data could be correlated with public IMDb ratings to re-identify specific users.⁷
This risk is magnified because data is rarely static. Datasets from different sources, often acquired through mergers, third-party agreements, or data brokers, can be combined.⁸ A user's seemingly anonymous activity on one platform can be linked to their real-world identity from another, creating a composite "super-profile" without their explicit knowledge or consent. This process of re-identification can reveal intimate details about a person's life, beliefs, and vulnerabilities, transforming disparate, non-sensitive data points into a highly invasive and detailed personal dossier.³ The fundamental problem is that as datasets grow and are linked, the possibility of identifying an individual from a small number of unique data points approaches certainty, making the very concept of "anonymous data" in a centralised system a dangerously flawed assumption.
This core problem is at the heard of our research into personal data, especially as it relates to identity. How to make data useful to a relying party, without it placing the relying party or the subject at risk? Not solving this problem has made delivering on the 7 laws of identity almost impossible as organisations couldn't benefit from implementing those laws.