Australian researchers are creating more data than ever before. Research data takes many hours to collect, curate and analyse, and often represent years of hard work. Research data also offers huge reuse potential.
Many datasets can be used again to answer new questions, but to make that happen, they need to be stored, findable and they need metadata to describe what’s in them so that they can be understood by others.
To help tackle the significant challenge of data storage, the ARDC is partnering with Australian research organisations to co-invest in storage capacity to manage datasets of national significance through our Data Retention Project.
The project is not only about helping to purchase the storage capacity, explains Dr Max Wilkinson, Research Data Infrastructure Architect at the ARDC, who is managing the project. “The data collections of national significance will have metadata, so researchers will be able to find and use them to answer questions as part of their research without having to re-create them.
“Also, institutions will be able to retain these important collections as national assets.”
The petascale of research data
Astronomy Australia Ltd (AAL), which facilitates access for Australian-based astronomers to the best research infrastructure, is a partner in the Data Retention Project and is one of the organisations tackling the huge data storage challenges of the astronomy research community. Their data storage challenges are growing exponentially with advanced observatories being built.
"The new Square Kilometre Array Observatory (SKAO), a next-generation global radio astronomy facility being built in Western Australia now, will produce multiple petabytes of data in a single night," explained Dr Robert (Xiaobin) Shen, Director of eResearch at AAL.
"That volume of data would fill an average smartphone in a second."
The huge data sets involved in astronomy means Robert has big questions to answer regarding data storage: “How can I provide reliable and sufficient storage capacity to fulfil research needs? Even transferring data can be really challenging.”
“With the ARDC’s support we are partially able to support relevant data storage for the next 2 years.”
For researchers, the new storage hardware brings more time to analyse important data, according to Robert: “In the short term, this support solves their data storage concerns. Although it’s only 2 years, at least these have been guaranteed until then.”
Metadata - data about the data
Robert is excited about the opportunity to enrich data collections with metadata as part of the ARDC project, which will enable datasets to be found by researchers and used to help answer new research questions.
“That means we also have 2 years to improve the data management practice, enriching the collection level metadata, which I see not as a burden, but an opportunity.”
Although metadata and data management may fall into the ‘admin’ side of work for researchers, they see the value in ensuring their hard-earned data is findable, accessible, interoperable (can be used across systems) and reusable (FAIR data).
“5 years ago when I spoke to researchers about metadata, they would ask “why should I bother?” Nowadays, researchers come to us and ask ‘please help’. People are quite serious about metadata, wanting to mint DOIs for example to have a persistent identifier for the collections, which we do with the ARDC’s help,” said Robert.
Storing data collections of national significance
In the Data Retention Project, the ARDC is partnering with Australian organisations supporting underpinning capacity to maximise the impact of important data output of Australian research. The impact of data collections are maximised when Australian researchers have timely access to data collections containing metadata and stable, persistent infrastructure.
The ARDC is partnering with the following organisations to provide this infrastructure and store data collections of national significance.
Phase 1 co-investment partners:
- Monash University
- National Computational Infrastructure (NCI)
- Pawsey Supercomputing Centre
- Tasmanian Partnership for Advanced Computing (TPAC) at the University of Tasmania
- University of Melbourne
We’re also please to announce the successful Phase 2 coinvestment partners, which will soon get their projects underway:
- Astronomy Australia Ltd
- AuScope (waiting for executed contract)
- Australian Plant Phenomics Facility
- BioPlatforms Australia
- University of New South Wales
- University of Queensland
The project partners will also be working together as part of the quarterly Data Retention Forum, to help build a definition of ‘Data Collections of National Significance’, and contribute to building a sustainable model to secure important and valuable data collections for the benefit of the Australian research sector.
“The Data Retention Project builds a bridge between Research Data Management and Operational Infrastructure Management,” said Max.
Alongside the Data Retention project, the ARDC Institutional Underpinnings program is working with Australian universities to establish good practice and institutional solutions to data management good practice.
The ARDC and AAL are supported by the National Collaborative Research Infrastructure Strategy (NCRIS), an Australian Government program to deliver world class research facilities so that Australian researchers can solve complex problems both here in Australia and around the globe.
Written by Jo Savill (ARDC). Reviewed by Dr Max Wilkinson (ARDC), Dr Robert (Xiaobin) Shen (AAL), Romy Pearse (AAL), Carmel Walsh (ARDC), Adelle Coote (ARDC).