2030 Geophysics Collections

Who will benefit

Researchers and research organisations, peak bodies, Infrastructure providers, commercial eInfrastructure providers, governments (state and federal), geophysicists, environmental researchers, data analysts

DOI

https://doi.org/10.47486/XN002

Program

Cross-NCRIS National Data Assets

The Challenge

Large volumes of geophysical data have been acquired by universities, industry and federal and state government agencies since the 1950s. Making the raw and high-resolution versions of this cross-NCRIS network data FAIR and integrated with existing government datasets is the challenge of the 2030 Geophysics Collections project.

The Response

The project makes rawer, high-resolution versions of AuScope-funded magnetotelluric (MT) and passive seismic (PS) data accessible online, compliant with the FAIR and CARE principles, and integrated with existing government datasets at the National Computational Infrastructure (NCI) and other sites, including TERN.

These datasets are suitable for programmatic access in high-performance computing environments at NCI. They lay the foundations for more rapid data processing by 2030 for next-generation, scalable and data-intensive computation, including data assimilation and computation using artificial intelligence and machine learning.

The project involves 9 elements:

A survey has been conducted of raw and other derivative, associated or other higher-processed geophysical data that could be part of an integrated national high-resolution reference collection. The survey initially focused on the AuScope-funded Magnetotelluric (MT), Passive Seismic (PS) and Distributed Acoustic Sensing (DAS) datasets.

Targeted raw geophysical datasets have been ingested and organised on the NCI filesystem so that they can be (re)processed with computational tools available within the NCI. Derivative versions have been linked back to the source datasets.

Geophysical data releases are now discoverable in the NCI Data Catalogue and catalogue metadata have been structured to enable ‘vertical’ integration between repositories that have a higher-level product but need to reference the rawer data at NCI. The data is also being made discoverable through the ARDC Research Data Australia service.

Where derivative data products hosted in other repositories need to reference less processed data at NCI, a review of relevant data catalogues has been undertaken to determine if they comply with the FAIR principles. Gaps and inconsistencies have been identified and priority issues targeted for remedial action.

Globally Unique Persistent Resolvable Identifiers (GUPRIs) have been assigned to each version of each dataset to support data citation and reproducibility. International standard identifiers can assist in disambiguating the people and organisations related to the acquisition, processing, publication and funding of the geophysics datasets.

A review has been undertaken to determine international community-preferred standards for raw and derivative geophysical datasets. Related domain-specific vocabulary standards will be assessed, and where relevant, the vocabulary will be hosted on the ARDC Research Vocabularies Australia service.

Learn more about the International Geophysical Standards Review.

Software suitable for NCI’s computing environments has been established with a focus on how to process raw geophysical data into higher-level products. Jupyter analysis notebook tutorials that make use of NCI’s scalable data analysis software environments have been developed.

Candidate FAIR Implementation Profiles (FIPs), compliant with current international standards, are being developed for use for the whole data ecosystem from acquisition to publication.

Projects are due to be completed by the end of July 2023, and final reports will be published.

The Outcomes

The project has created the foundations for a national, high-resolution geophysical data collection that:

vertically integrates source datasets at NCI to derivative products hosted elsewhere
enables horizontal integration of remotely sensed and other geophysical datasets hosted at NCI with observational datasets hosted at TERN or elsewhere
links citing roles and organisations involved in each phase of the dataset using identifiers.

The project has created the NCI Geophysics Specialised Environments, which provides access to data, tools and online high-end environments for both cloud and high-performance computing (HPC). Resources include:

Software includes:

the NCI-geophysics module, which integrates Python, Julia and R environments together with thousands of pre-built geophysics, geoscience and data science-related libraries – this module can be used for batch jobs on Gadi as well as through JupyterLab or Virtual Desktop apps on NCI’s Australian Research Environment
specialised geophysics software for various techniques along with notebooks and how-to instructions, including Magnetotellurics (MT), Seismic, Airborne Electromagnetics and multi-physics analysis software.
NCI’s AI/ML environment that can be used for geophysics-based machine learning analysis and processing using GPU resources – an example tutorial that utilises this environment was developed for machine-learning-driven seismic Full Waveform Inversion (FWI)
a new JuliaGeo package.

All datasets associated with the project are being published in the NCI Data Catalogue. The datasets in the high-resolution geophysics collections are designed for machine actionability in HPC. FAIR Implementation Profiles (FIPs) will aid machine interoperability.

The NCI geophysics collections are also being harvested into the ARDC Research Data Australia service. Detailed guides on accessing the data are available on the geophysics community pages within the NCI Documentation website.

To address the lack of consistency in the standards used across geophysical datasets, the project has also initiated an International Geophysics Standards Review.

Who Will Benefit

Researchers and research organisations, peak bodies, infrastructure providers, commercial eInfrastructure providers, governments (state and federal), geophysicists, environmental researchers and data analysts will benefit from the project’s core features:

new multi-geophysical research techniques to provide new insights into geophysical properties from the surface of the Earth to the core
scaling geophysics to exascale research communities with shared community codes built around high-performance and high-resolution datasets, enabling geophysicists from different disciplines to collaborate and share their processing and modelling workflows, results and analysis
increased confidence in decision making, enabling stakeholders to transparently trace data products back to the source and reproduce workflows.

Before this project, Australian geoscience researchers faced challenges handling high-resolution data. This posed a risk to AuScope’s future amid growing data size and complexity.

Collaborating with NCI enables open and FAIR access to curated datasets, software management, and streamlined processing and analyses, removing technical obstacles. Geophysical survey data processing, which took days or weeks, now happens within minutes at NCI.

This transformative capability has garnered global interest from researchers and industry alike.
Dr Rebecca Farrington, Director of Research Data Systems, AuScope

This project has provided Australian researchers access to a world-class computational and data science digital environment and platform suitable for future research in geoscience.

This includes integrating a diverse range of scientific software, assembling high-quality datasets within an HPC environment, and supporting collaborative research across the country.

This allows researchers to efficiently create workflows tailored to their specific use cases, leading to greater research innovation.
Dr Ben Evans, Deputy Director, HPC and Data Innovation, NCI

The Partners

Key Resources

NCI AI/ML Environment
NCI Specialised Environment – Geophysics
NCI Geophysics Community
NCI Geophysics Collections:
- AuScope (2023): AuScope Magnetotellurics (MT) Collection. v1. NCI Australia.dataset. https://dx.doi.org/10.25914/mtjg-jp22
- AuScope; Research School of Earth Sciences (RSES), Australian National University (2023): AuScope Distributed Acoustic Sensing (DAS) Collection. v1. NCI Australia.dataset. https://dx.doi.org/10.25914/zr9f-1e98
- Cudahy, T. et. al. (2023): National ASTER Map of Australia. v1. NCI Australia.dataset. https://dx.doi.org/10.25914/5f224f36ec890
- Research School of Earth Sciences (RSES), The Australian National University (2023): AusPass Passive Seismic Collection. NCI Australia.dataset. https://dx.doi.org/10.25914/zyay-2g34

Further Resources

Read AusScope’s article on the project.
Access conference materials on the project, including:
- “A path towards reproducible magnetotelluric (MT) time series processing on HPC”, presented at Australasian Leadership Computing Symposium (ALCS) 2023
- “The Known Knowns, the Known Unknowns and the Unknown Unknowns of Geophysics Data Processing in 2030”, presented at European Geoscience Union (EGU) General Assembly 2022
- “Building a National High-Resolution Geophysics Reference Collection for 2030 Computation”, presented at Australasian Exploration Geoscience Conference (AEGC) 2023
- “Using 2030 computational techniques to unleash the untapped potential of existing geophysical datasets in mineral exploration: Opportunities and challenges”, presented at the AEGC 2023 workshop “Scaling MT acquisition, processing, interpretation, and people”.

Contact the ARDC

Timeframe

October 2021 to July 2023

Current Phase

Complete

ARDC Co-investment

$400,000

Project lead

National Computational Infrastructure (NCI)

Related Projects

A person wearing a heavy-duty mask with a bushfire looming

Aggregating and Integrating Data on Health Outcomes Associated with Bushfires at a National Scale

Exploreabout Aggregating and Integrating Data on Health Outcomes Associated with Bushfires at a National Scale

Xanthorrhoea grass trees resprouting after a bushfire - ash on the forest floor, black grass tree stumps with green grass spouting from their tops.

Bushfire Research Data Management Plans

Exploreabout Bushfire Research Data Management Plans

A board showing the fire danger rating of the day with a fire behind it

Aggregated and Harmonised Fuel Data on a National Scale

Exploreabout Aggregated and Harmonised Fuel Data on a National Scale

a rural fire brigade firefighter standing in a burnt forest with a small fire burning beyond him. Image - Stuart - 507395677 / AdobeStock.com

Framework for Sharing Bushfire Data and Tools Between Jurisdictional Agencies

Exploreabout Framework for Sharing Bushfire Data and Tools Between Jurisdictional Agencies

Search all resources