Definitions of combinations of different datasets¶
Even when data may not seem sensitive, it can become sensitive through triangulation or data linkage.
In triangulation, pieces of information within the same dataset are not identifiable on their own, but become identifiable when combined with other data. For example, merging information about a participant’s age with information about a particular medical condition may make them identifiable.
In data linkage, two or more anonymised datasets containing the same individuals are combined, which in turn increases the risk of identification. For example, combining a dataset of hospital episode statistics with a dataset of educational information may make it possible to identify individuals within the dataset.
A useful link about this issue is the section on planning, and particularly triangulation, in the guide for Research Data and Information management by James Cook University.
Examples of projects with a combination of different datasets¶
Hardelid 2014 on chronic conditions in children in the UK
Resources¶
Privacy Meter, a Github repository with resources that enable users to evaluate the privacy risk of algorithms.
- Hardelid, P., Dattani, N., & Gilbert, R. (2014). Estimating the prevalence of chronic conditions in children who die in England, Scotland and Wales: a data linkage cohort study. BMJ Open, 4(8), e005331. 10.1136/bmjopen-2014-005331