De-Personalisation
The Honest Broker Service was established considering the Information Commissioners Office Anonymisation Code of Practice and the new guidance on anonymisation and pseudonymisation.
Data accessed in the Honest Broker Service Safe Setting is data that has been de-personalised but where there is a risk of disclosure as the extra detail could potentially allow an individual service user to be re-identified if additional steps and processes were not in place.
The extra granularity of information is needed during the course of research in the Safe Setting to get useful results.
Within the Safe Setting the data is protected with additional controls (security controls, process controls and contractual controls). Researchers are trained in how to operate data in the safe setting and are subject to contracts which outline potential sanctions for misuse of the data.
The results which are released from the Safe Setting for publication are aggregate data which has been checked and can be published openly as it is not possible to identify any individual.
During these processes the data moves through what the ICO refer to as the “spectrum of identifiability”, this is outlined in detail in Chapter 2 of the new guidance which recently went out for consultation and in this useful guide provided by Understanding Patient Data.
Chapter 2: How do we ensure anonymisation is effective?
UK Research and Innovation have also published guidance on identifiability, anonymisation and pseudonymisation which discusses the concept of identifiability and how techniques such as anonymisation and pseudonymisation can be used to minimise the risk of identifiability of data for research.
The steps taken are outlined below:
Preparing data for access within the safe setting:
- To de-personalise the data the Honest Broker Service remove all direct identifiers such as Health and Care Number and Name, Address, full Date of Birth.
- We also consider secondary identifiers, these are characteristics that when combined with other data could potentially identify a person. An example of reducing identification risk through secondary identifiers would be creating age-bands rather than giving out individual age and also aggregating data up to larger geographies rather than providing precise location data.
- The wider context of the analysis is also considered, for example if the focus of the research is a rare disease or relates to public events already well documented in the public record it may not be possible to effectively de-personalise the data. Issues like this would usually be discussed in assessing the feasibility of a project and could mean that the project does not progress or there may be additional requirements in order to protect the data such as requirement to attend the physical safe room to conduct analysis rather than accessing data remotely.
Outputs
- When a researcher has completed a piece of analysis they submit an output request which is screened by the Honest Broker Service team prior to release. We use agreed statistical disclosure control techniques to ensure there is no risk of re-identification.
- We apply a threshold which is the number of observations (count of individuals) displayed or underpinning a piece of analysis that must be met for the output to be considered safe for release. The threshold commonly applied across Trusted Research Environments and used by the Honest Broker Service is 10.
- The outputs that are released would usually be aggregate counts, results of statistical tests, graphs and occasionally code.