ARDC Community Data Lab

Digital tools and guides for researchers accessing and using data from libraries, archives, museums and other collections
ARDC Community Data Lab

The ARDC Community Data Lab helps researchers use tools and methods to find, access and use data from archives, museums and other collections.

It provides tools and guides for a range of common research tasks, including:

  • text analysis
  • data transformation
  • annotating images.

The ARDC Community Data Lab is primarily focused on providing tools and resources for humanities, arts and social sciences (HASS) researchers, however the tools can also be used by researchers from diverse domains. It also has a focus on providing guidance to researchers using Trove.

The ARDC Community Data Lab has 2 parts:

The ARDC Community Data Lab is an initiative of the ARDC’s HASS and Indigenous Research Data Commons, which is establishing national-scale data infrastructure for HASS and Indigenous research data communities.

Who Is Using the ARDC Community Data Lab?

The outputs of the Community Data Lab are being used by researchers interested in accessing materials available via the National Library’s Trove service.

In addition to use, researchers and developers are also welcome to adapt, extend and share these outputs as part of participating in the Lab.

Access the ARDC Community Data Lab Tools, Services and Guides

The Community Data Lab is a suite of services and guides that is growing over time in response to the needs of the research community.

Trove is not just a website, it’s a source of data for new forms of digital research across a range of topics and disciplines. But where do you start? 

The Trove Data Guide describes what data is available, and shows you how to find and access it. It explores Trove’s possibilities for research, but also documents its problems and limits. The Trove Data Guide will help you approach Trove critically and understand how to integrate it within your research project.

The Trove Data Guide explores the different types of data available from Trove, covering:

  • what is Trove
  • understanding search
  • accessing data
  • digitised newspapers and gazettes
  • other digitised resources
  • research pathways.

The Trove sections of the GLAM Workbench complement the Trove Data Guide, providing tools, code, and examples to help you work with data from Trove. Through collaboration with the Community Data Lab, the GLAM Workbench has been updated to support version 3 of the Trove API, and to include machine-readable metadata describing notebooks and datasets.

The adoption of the International Image Interoperability Framework (IIIF) in cultural institutions presents an opportunity for seamless collaboration on the annotation of images. A limiting factor has been the threshold of development required to implement a scholarly annotation system.

Glycerine is a workbench for annotating and publishing IIIF images built in partnership between the ARDC and Systemik Solutions. Glycerine provides a suite of annotation tools and end-to-end workflows for researchers, curators and students to collaborate on projects across repositories. 

Sets of annotations can combine semantic tags from domain-specific vocabularies with critical analysis in multiple languages. Annotated images can be published as research outputs in immersive and engaging visualisations and archived in sustainable formats.

Glycerine is hosted on the ARDC Nectar Research Cloud. Australians with .edu.au or .gov.au addresses can access Glycerine for free.

Stylometry is the analysis of the language of texts using statistical methods. It has mostly been applied to literary texts.

The Stylometric Intelligent Archive (SIA) workbench offers workflows for some common stylometry methods. It accepts plain text but is particularly adapted for texts marked up in the Text Encoding Initiative (TEI) format. The workbench allows you to:

  • filter word counting by XML elements
  • segment texts as overlapping or non-overlapping blocks and by XML tags
  • store and edit text sets for repeated use.

SIA runs a caching system to speed up large and complex operations.

Within SIA researchers can assemble and manage collections of texts, make counts of words within them and then either run experiments with the word counts locally or export tables for analysis elsewhere.

Researchers can use the SIA either with a Jupyter Book or with the no-code user interface platform. The SIA is hosted on the ARDC Nectar Research Cloud. It is open-access, requiring only 2-step registration with a confirmation email.

Computational notebooks mix textual detail with executable bits of code in a familiar document-like structure. This is great not only for explaining a research method or question, but actually demonstrating the calculations, analysis or results (e.g. a graph). Furthermore, these are editable like any other document, meaning you can adapt a related notebook for your own purposes. But notebooks must be run or executed in a special environment.

The ARDC provides 2 services to help researchers run computational notebooks:

ARDC Jupyter Notebook Service

If you have fairly simple needs for computational notebooks, our Jupyter Notebook service may suffice. If you want to run someone else’s notebook, you might like to start here.

ARDC BinderHub Service

Our BinderHub Service allows researchers to load computational notebooks in a custom JupyterHub environment. Properly configured, the files alongside the notebooks will load all the necessary pieces of software (and possibly data) to allow the notebook to run without issue. People who develop notebooks for others to use can provide links to their notebooks that directly link to the service to run the notebook. This makes it much easier for others to reuse those notebooks without having to download data and software and install it correctly.

A data guide has been produced including Jupyter notebooks demonstrating hot spot analysis, and for searching the Gazetteer of Historical Placenames (GHAP) using its API.

Hotspot analysis

Searching on the Gazetteer of Historical Placenames

Can’t Find What You’re Looking For?

We’ll be running co-design workshops with the research community to design the next phase of the Community Data Lab. Get notified about the workshops by registering your interest in our HASS and Indigenous Research Data Commons.

You can also let us know your research infrastructure needs by contacting us.