Responsible use of personal data means protecting identities. Before identifiable information can be collected, used or shared, researchers must consider legal and ethical requirements, such as privacy legislation and informed consent.
While access to data should ideally be as open as possible, access to sensitive and identifiable data should be as closed as necessary.
It’s possible to reduce the identifiability of data through techniques referred to as ‘de-identification’, ‘anonymisation’, or ‘de-personalising’. But, in the current age of big data and triangulation, there is debate over whether or not any method can reliably ensure the complete removal of identifiable information.
This does not mean that data cannot be used or shared for research. But it does mean that well-defined approaches for managing and working with data must be implemented.
Managing identifiable data
Research data often needs to contain personal information to help with study administration and qualitative analysis. Establishing a well-defined data management plan before starting your research is the best way to meet ethical and privacy requirements through access control and data security.
A safe data management plan can include:
- control of access through physical or digital means, such as passwords
- encryption of data, particularly if it is being moved between locations
- never putting identifiable and unencrypted data on easily lost items such as USB keys, laptops and external hard drives
- taking reasonable actions to prevent the inadvertent disclosure, release or loss of sensitive personal information.
Five Safes: Working with sensitive data
The UK Data Service has developed the Five Safes framework for controlled access to sensitive or confidential data — safe data, safe projects, safe people, safe settings and safe output.
Australia also has guidelines. Commonwealth legislation sets out 13 privacy principles and most states have their own privacy legislation. The Office of the Australian Information Commissioner has details.
Learn about the ARDC co-investment project that’s establishing a shared and distributed sensitive data access management platform for the social sciences and related disciplines, CADRE.
Data de-identification can protect individuals, organisations and businesses, and protect information such as the spatial location of mineral or archaeological findings or endangered species.
It’s not an exact science and judgement calls may still need to be made when de-identifying data. It’s also not a ‘magic bullet’ to share and publish sensitive data. De-identification should be considered within a range of activities to protect the privacy of research participants, such as obtaining informed consent for data sharing and controlling access to the data. The validity of some research may also be reduced if it uses de-identified data.
Best practice basics for managing de-identification
It’s critical to have a clear plan for managing identifiable data through all research stages and when publishing data. Understanding the requirements and risks will help inform the kinds of consent, data security, and access controls required.
Here are some tips to start your de-identification:
- plan de-identification early in the research as part of your data management planning
- make sure the consent process includes the accepted level of anonymity required and clearly states what may and may not be recorded, transcribed, or shared
- retain original unedited versions of data for use within the research team and for preservation
- create a de-identification log of all replacements, aggregations or removals made
- store the log separately from the de-identified data files
- identify replacements in text in a meaningful way, e.g. in transcribed interviews indicate replaced text with [brackets] or use XML markup tags, such as <anon>…..</anon>
- for qualitative data (such as transcribed interviews or survey textual answers), use pseudonyms or generic descriptors rather than blanking out information
- digitally manipulate audio and image files to remove identifying information
Australian and international resources
For more in-depth information on de-identification, explore the following Australian resources:
- Australian Government’s guide to ‘De-identification Decision Making Framework’
- Office of the Australian Information Commissioner’s guidance on de-identification of data and information and guide to securing personal information
- Australian Government’s Guidelines for the Disclosure of Secondary Use Health Information
- ABS Data Confidentiality Guide
- Queensland Office of the Information Commissioner’s Guidelines: privacy and de-identification
- The Future of Privacy Forum: A visual guide to practical data de-identification
- Office of the National Data Commissioner: Assessing Data Requests
The following international resources are also available:
- US Department of Health & Human Services’ de-identification guide
- USA National Institute of Standards and Technology’s guides to de-identifying government datasets and personal information
- UK Anonymisation Network’s Anonymisation Decision-Making Framework
- UK Data Service’s research data management advice
- UK Research Data Network’s resources list for managing personal data
- UK Information Commissioner’s Office’s Anonymisation guide
- UK Data Archive’s advice on anonymising qualitative data
- Irish Qualitative Data Archive’s tool for anonymising qualitative data.