2.4 Sensitive data

Description

In this unit you will learn to:

Manage sensitive data to ensure privacy and compliance with relevant national or international regulations

Learning resources

Presentation Slides

Sensitive Data

Sensitive personal data, as defined by the European Commission, includes:

Personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs.
Information about trade union membership.
Genetic data and biometric data processed solely for the purpose of identifying a person.
Health-related data.
Data concerning the sex life or sexual orientation of an individual.

This type of data requires specific processing conditions to ensure protection and privacy.

For example, consider a person named "Alex Johnson":

Direct identifiers: Name, pictures.
Indirect identifiers: Workplace and occupation, such as "Prime Minister of UK between 2000-2004."

Challenges of working with Sensitive data

Working with sensitive data presents a number of challenges, particularly with respect to anonymisation and storage.

Techniques such as pseudonymisation, data masking and generalisation are used to prevent identification while retaining the utility of the data. These methods alter data to protect identities but still allow meaningful secondary analysis.

However, anonymisation techniques are not always foolproof against re-identification. Assessing the sufficiency of these techniques requires an assessment of the risk of re-identification.

The storage of sensitive data requires strict security measures. Data should be encrypted, access controlled and regularly audited. Secure storage environments, both physical and digital, are essential to protect sensitive information.

Security

When it comes to security, we need to revisit the storage issue we saw in the previous section. Both storage solutions and storage media are crucial aspects to consider, but from a slightly different angle:

Choose the medium with the lowest risk of exposure.

Beyond this perspective, we must then consider the levels of access we can provide to sensitive/personal data when it comes to sharing with internal or external partners:

**Types of access:

Open data - fully anonymised and no personal data identifiable
Restricted - still anonymised data
Closed data - sensitive data that cannot be properly curated.

Each option always depends on the specific research activity, the type of data we are working with and the composition of the project team, whether it is based in the same institution, in different institutions but in the same country, or if we then start to consider further regional aspects such as sharing data outside the EU and third countries. Different laws apply and specifics need to be carefully negotiated.

Finally, it is important to include security measures such as password protection and encryption in your data management strategy for sensitive data:

Password protection and encryption offer significant benefits in securing sensitive data. They prevent unauthorised access by ensuring that only those with the correct credentials can view or modify the data. Encryption converts data into an unreadable format that can only be deciphered with the correct decryption key. This layered protection helps maintain the confidentiality and integrity of sensitive information, providing a robust defence against data breaches and cyber threats.

Anonymisation

Stam and Diaz (2020) assert that anonymisation can bypass data protection laws since anonymised data lacks personal information. However, achieving full anonymisation is challenging, particularly for qualitative data that focus on specific contexts. When dealing with materials like interview transcripts, it’s safer to assume that personal data is being processed, necessitating compliance with data protection laws. Participants must be fully informed about the handling and third-party disclosure of their data.

As a summary of the steps towards anonymisation, we would like to point out the following ones from the UK Data Service guidance:

Remove direct identifiers
Aggregation
Generalisation
Restrict the upper or lower rangers
Anonymise geo-referenced data
Anonymise relational data

Each category carries its own methodology and it is important to understand which and how to implement the various anonymisation steps. We would also like to point you towards the Information Commissioner's Officer (ICO) guidance on effective anonymisation: Chapter 2: how do we ensure anonymisation is effective?

Chapter 2 of the ICO's draft guidance on anonymisation highlights the importance of effective anonymisation to reduce the risk of identifiability to a sufficiently low level. It emphasises that identifiability extends beyond names to any data that can distinguish individuals. The guidance stresses that achieving effective anonymisation requires a thorough understanding of data protection laws and techniques to mitigate the risk of re-identification. It also emphasises the need for continuous assessment and adaptation of anonymisation methods as part of a broader data protection strategy.

https://ico.org.uk/media/about-the-ico/documents/4018606/chapter-2-anonymisation-draft.pdf

https://forscenter.ch/wp-content/uploads/2023/03/qualitative-data-anonymisation_final.pdf

Tools

In the final part of this unit, we would like to point some tools that could help with the anonymisation and sharing data process:

AMNESIA
Tools recommended by UK Data Service
- Text anonymisation helper tool
- AxCrypt (commercial) – Encryption
7zip
Zendto

We would like to highlight two of the tools mentioned here:

AMNESIA is a tool developed by OpenAIRE: key features of the tool are:

Pseudonymisation
Masking
K-Anonymity
Demographic Statistics

You can see further details about the tool, here.

Zendto is a tool that supports filesharing. Key aspects of the tool are:

Secure encryption & decryption
Customisable and translation to 14 languages
Open Source & Free
Aligns with GDPR
Large file transfer

ZendTo is an open source file transfer and sharing solution designed to send and receive large files easily and securely. It's useful for organisations looking for a secure, private alternative to commercial file sharing services.

Feel free to explore the tool here.

References

CESSDA Training Team. (2020). CESSDA Data Management Expert Guide. CESSDA ERIC. https://doi.org/10.5281/zenodo.3820473

European Commission. (n.d.). What personal data is considered sensitive? Retrieved July 15, 2024, from https://commission.europa.eu/law/law-topic/data-protection/reform/rules-business-and-organisations/legal-grounds-processing-data/sensitive-data/what-personal-data-considered-sensitive_en#references

Information Commissioner's Office. (n.d.). Chapter 2: How do we ensure anonymisation is effective? Retrieved July 15, 2024, from https://ico.org.uk/media/about-the-ico/documents/4018606/chapter-2-anonymisation-draft.pdf

UK Data Service. (n.d.). Anonymisation step-by-step. Retrieved July 15, 2024, from https://ukdataservice.ac.uk/learning-hub/research-data-management/anonymisation/anonymisation-step-by-step/

Stam, A, Diaz, P. (2023). Qualitative data anonymisation: theoretical and practical considerations for anonymising interview transcripts. FORS Guide No. 20, Version 1.0. Lausanne: Swiss Centre of Expertise in the Social Sciences FORS. doi:10.24449/FG-2023-00020