ARDC Community Data Lab (CDL) Project

The ARDC Community Data Lab helps researchers use data from galleries, libraries, archives, museums and other collections.
A group of people in a work meeting. Sunlight is shining through the arched window
Who will benefit
Researchers using data from libraries, museums, archives and other collections

The Challenge

Australian galleries, libraries, archives and museums hold a wealth of data on our history, culture, language and more. Traditional research using these collections involved days or weeks in a reading room, but many collections are now digitised, available online from anywhere. 

The emergence of digitised collections has created exciting opportunities for data-driven research. However, researchers need new skills and tools to use computational methods, which have not traditionally been taught in universities and institutions. 

In 2022, the ARDC commissioned a report on findings from consultation with the research community about how they use Trove, an online platform run by the National Library of Australia. During the consultation, researchers from across the humanities, arts and social sciences (HASS) raised the multitude of diverse questions and approaches that could be brought to such a rich source of data. Given this diversity, it was recognised that establishing a way to pool approaches for research would be of value, by enabling researchers to use, reuse, share and enhance tools and datasets.

A recommendation of the report was the creation of a platform where tools, code and datasets that make use of data on Trove could be shared, organised and annotated by researchers. It would create a ‘collaboration layer’ on top of an improved Trove API.

The Response

The ARDC Community Data Lab (CDL) fosters the development of tools, datasets, and documentation that enable researchers to use data from libraries, museums, archives and other collections. It does this by:

  • gathering information on researcher needs through community co-design processes
  • partnering with other organisations to develop resources that meet identified researcher needs
  • creating frameworks and policies to guide the development of new resources
  • sharing details of new resources and supporting related initiatives for training in digital research skills, creating a pool of approaches for research.

By focusing on processes for efficient, collaborative, and sustainable development, the ARDC CDL will be able to respond quickly to new research needs. The co-design framework will foster connections between researchers and developers, building capacity and engagement.

Phase 1

Phase 1 of the ARDC CDL is focused on accessing and using data from Trove. Outcomes of Phase 1 include:

  • detailed documentation available through the Trove Data Guide, which creates a ‘collaboration layer’ for researchers to use the Trove API
  • example tools for text analysis, image annotation, and geospatial analysis
  • documented architectural principles and patterns.

Phase 1 has also engaged with a number of related ARDC projects, including:

Phase 2

Phase 2 of the ARDC CDL will be developed from a series of thematic co-design sessions in July 2024. We follow the co-design process described in the HASS and Indigenous Research Data Commons Co-Design Framework, which is based on established methods such as the TACSI Co-Design Framework. 

Get notified about the workshops by registering your interest via the HASS and Indigenous Research Data Commons.

Feedback on the project is welcome. Please contact us.

Who Will Benefit

  • Anyone interested in using collections in galleries, libraries, archives and museums (GLAM) for doing research will benefit from understanding the data held in collections, and methods that can be used to analyse them.
  • GLAM institutions benefit both from greater exposure and interest in their holding, and from the benefit of research outcomes building upon those holdings.
  • The Digital Research Infrastructure sector will benefit from the development of good and best practices around developing infrastructures in this way.

Outcomes

The full listings of outputs from Phase 1 include an initial suite of underpinning services and guides, which can be used right now.

  • ARDC BinderHub Service, upon which a Jupyter notebook and JupyterHub based approach to developing tools and guidance can be built 
  • Trove Data Guide
  • GLAM Workbench, with enhancements focused on adding machine-readable metadata to Trove-related notebooks and datasets
  • Glycerine, an image annotation workbench
  • Spatio-temporal hotspot mapping data guide (access the notebook)
  • Searching on the Gazetteer of Historical Placenames guide (access the notebook)

Learn more and access them via the ARDC CDL service page. Read our guide to Trove for researchers.

Also developed were proof of concept services, to test the viability of the architectural principles developed alongside them. These include:

We recommend these outputs to developers interested in extending them. 

This project has delivered outcomes for 3 different audiences and continues to do so:

The primary outcome is to facilitate easier, faster and or new ways to access and benefit from GLAM sector holdings. We have done this by creating a (growing) set of guides and tools, which are available via our resources and services pages, starting with:

If you are a researcher comfortable with writing code, you may also be interested in outcomes for the national digital research infrastructure sector.

We are always seeking partnerships with the GLAM sector to focus on your holdings. Right now, we are doing this through Trove guidance and tools which aggregates the holdings of many GLAM institutions. Get in touch if you’re interested in drawing attention to your holdings.

If you are a GLAM sector employee comfortable with writing code, you may also be interested in the outcomes for the national digital research infrastructure sector.

We are seeking to grow a body of best practice to:

  • develop a ‘collaboration layer’ built upon existing holdings (especially via APIs)
  • document useful patterns of development
  • cohere a community of practice
  • identify and build out supporting services and activities to enable the above.

To this end, we have produced documentation and architectures, including:

Researcher Advisory Group

The ARDC Community Data Lab Researcher Advisory Group provides focused and specific input and feedback to the project team as the project progresses to ensure the project outputs have broad applicability to researchers who would benefit from using data lab tools.

The group provides domain knowledge, independent critical thinking and advice on the defined project work packages and deliverables.

The members of the Researcher Advisory Group are:

  • Professor Catherine Travis, Chair of Modern European Languages at the College of Arts and Social Sciences, Australian National University
  • Dr Yorick Smaal, Senior lecturer in History in the School of Humanities, Languages and Social Science, Griffith University
  • Dr Trent Ryan, Research Fellow to the Indigenous Data Network at the Melbourne School of Population and Global Health, University of Melbourne
  • Professor Adrian Vickers, Professor of Southeast Asian Studies, University of Sydney
  • Dr Mike Jones, Postdoctoral Research Fellow in the College of Arts and Social Sciences, Australian National University
  • Dr Terhi Nurmikko-Fuller, Senior research fellow at the Centre for Social Research & Methods, Australian National University
  • Jacinta Walsh, PhD Candidate, Monash Indigenous Studies Centre (MISC)
  • Dr Leah Henrickson, Lecturer in Digital Media and Cultures, School of Communication and Arts, The University of Queensland
  • Professor James Smithies, Director, Digital Research (HASS) Australian National University College of Arts and Social Sciences
  • Dr Imogen Wegman, Lecturer in Humanities, Office of the School of Humanities, University of Tasmania

Key Resources