Enhancing Metadata for Inclusive Research on Entrenched Disadvantage

Increasing the utility of important social science datasets for researchers.

Silhouettes of people standing at sunset on the sea shore
Thematic research data commons is:HASS and Indigenous

The Challenge

Australia is creating data systems, including public sector data assets, that can support research, policy analysis and evaluation in areas such as education, health, employment, inequality and disadvantage. However, the state of administrative data and the associated metadata needs to be improved so that considerably less time, effort and resources are required to conduct this research.

The Response

This pilot project aimed to showcase an example for enhancing metadata, based on the metadata associated with Higher Education (HE) administrative data within the Person Level Integrated Data Asset (PLIDA), formerly known as the Multi-Agency Data Integration Project (MADIP).

The goal was to increase the data’s utility for research analysts by focusing on metadata content, including good metadata practice, rather than on the technicalities surrounding information management or data quality itself. The project aimed to significantly improve metadata quality and usability, ultimately enhancing the utility of HE administrative data for researchers and policymakers.

HE administrative data refers to a comprehensive set of statistics related to higher education institutions.

The PLIDA dataset is currently used by over 200 research projects led by government, academia and private institutions. Integrated data assets hold a broad range of data that allow complex questions to be analysed, with new insights that aren’t available from a single data source.

The deliverables for the project were: 

  • an outline and synthesised summary of existing metadata standards with a focus on administrative and social science data
  • assessment of existing metadata of HE administrative data (within PLIDA/MADIP, and external to it)
  • a user experience report, based on targeted consultations, outlining perceived metadata shortcomings when working with PLIDA/MADIP/HE data and metadata user preferences.
  • best practice metadata elements for HE data (and social science administrative data more broadly, as applicable).
  • guidelines for implementing best practice metadata standards for HE data over time (and in the context of the ABS’s DataLab environment) with relevance for data custodians and the ABS
  • a forward plan for administrative data (with social science relevance) with guidelines and accompanying notes on best practice metadata implementation for data custodians and the ABS.

Who Will Benefit?

Social science researchers and analysts, government data custodians and providers, particularly those using PLIDA and contributing data to the dataset.

The Partners

  • University of Queensland (project lead)
  • Australian Bureau of Statistics (ABS)
  • Australian Government Department of Education
  • ARDC

Outcomes

The major project outputs included:

  • metadata needs for research analysts in relation to integrated administrative social science data
  • good-practice metadata examples
  • assessment of existing metadata of HE admin data (within PLIDA/MADIP, and external to it)
  • user experience report, based on targeted consultations, outlining perceived metadata shortcomings when working with PLIDA/MADIP/HE data and metadata user preferences
  • a forward plan with guidelines for improving PLIDA metadata.

The expected longer-term outcomes of this project include:

  • improved knowledge/awareness among data custodians/providers about good-practice data curation and documentation including CARE and FAIR principles
  • improved, and ongoing improvement of, documentation of government administrative data within the DataLab environment
  • enhanced researcher usability of PLIDA/MADIP, Higher Education data
  • increased demand by social researchers for working with administrative data in the DataLab environment.  

The pilot project has laid a crucial foundation for an upcoming, more extensive social sciences investment initiative within the ARDC HASS and Indigenous Research Data Commons – the Social Science Research Infrastructure Network. Building on the issues identified in this pilot project and leveraging insights from open co-design workshops with a range of stakeholders, a large-scale project has been designed that includes new partners and a much more comprehensive program of work to tackle some of the most pressing challenges in the social science data infrastructure.

Key Resources

  • Learn more about this project, particularly developments in integrated administrative social science data (IASSD), by watching a webinar on the project at Social Sciences Week 2024 in September 2024: