2030 Geophysics Collections

Making national high-resolution geophysics reference collections suitable for exascale computing in 2030
Geologist with a computer next to a rock formation
Who will benefit
Researchers and research organisations, peak bodies, Infrastructure providers, commercial eInfrastructure providers, governments (state and federal), geophysicists, environmental researchers, data analysts

The Challenge

Large volumes of geophysical data have been acquired by universities, industry and federal and state government agencies since the 1950s. Making the raw and high-resolution versions of this cross-NCRIS network data FAIR and integrated with existing government datasets is the challenge of the 2030 Geophysics Collections project.

The Response

The project makes rawer, high-resolution versions of AuScope-funded magnetotelluric (MT) and passive seismic (PS) data accessible online, compliant with the FAIR and CARE principles, and integrated with existing government datasets at the National Computational Infrastructure (NCI) and other sites, including TERN. 

These datasets are suitable for programmatic access in high-performance computing environments at NCI. They lay the foundations for more rapid data processing by 2030 for next-generation, scalable and data-intensive computation, including data assimilation and computation using artificial intelligence and machine learning.

The project involves 9 elements:

A survey has been conducted of raw and other derivative, associated or other higher-processed geophysical data that could be part of an integrated national high-resolution reference collection. The survey initially focused on the AuScope-funded Magnetotelluric (MT), Passive Seismic (PS) and Distributed Acoustic Sensing (DAS) datasets.

Targeted raw geophysical datasets have been ingested and organised on the NCI filesystem so that they can be (re)processed with computational tools available within the NCI. Derivative versions have been linked back to the source datasets.

Geophysical data releases are now discoverable in the NCI Data Catalogue and catalogue metadata have been structured to enable ‘vertical’ integration between repositories that have a higher-level product but need to reference the rawer data at NCI. The data is also being made discoverable through the ARDC Research Data Australia service.

Where derivative data products hosted in other repositories need to reference less processed data at NCI, a review of relevant data catalogues has been undertaken to determine if they comply with the FAIR principles. Gaps and inconsistencies have been identified and priority issues targeted for remedial action.

Globally Unique Persistent Resolvable Identifiers (GUPRIs) have been assigned to each version of each dataset to support data citation and reproducibility. International standard identifiers can assist in disambiguating the people and organisations related to the acquisition, processing, publication and funding of the geophysics datasets.

A review has been undertaken to determine international community-preferred standards for raw and derivative geophysical datasets. Related domain-specific vocabulary standards will be assessed, and where relevant, the vocabulary will be hosted on the ARDC Research Vocabularies Australia service.

Learn more about the International Geophysical Standards Review.

Software suitable for NCI’s computing environments has been established with a focus on how to process raw geophysical data into higher-level products. Jupyter analysis notebook tutorials that make use of NCI’s scalable data analysis software environments have been developed.

Candidate FAIR Implementation Profiles (FIPs), compliant with current international standards, are being developed for use for the whole data ecosystem from acquisition to publication.

Projects are due to be completed by the end of July 2023, and final reports will be published.

The Outcomes

The project has created the foundations for a national, high-resolution geophysical data collection that:

  • vertically integrates source datasets at NCI to derivative products hosted elsewhere
  • enables horizontal integration of remotely sensed and other geophysical datasets hosted at NCI with observational datasets hosted at TERN or elsewhere
  • links citing roles and organisations involved in each phase of the dataset using identifiers.

The project has created the NCI Geophysics Specialised Environments, which provides access to data, tools and online high-end environments for both cloud and high-performance computing (HPC). Resources include:

Software includes:

  • the NCI-geophysics module, which integrates Python, Julia and R environments together with thousands of pre-built geophysics, geoscience and data science-related libraries – this module can be used for batch jobs on Gadi as well as through JupyterLab or Virtual Desktop apps on NCI’s Australian Research Environment
  • specialised geophysics software for various techniques along with notebooks and how-to instructions, including Magnetotellurics (MT), Seismic, Airborne Electromagnetics and multi-physics analysis software.
  • NCI’s AI/ML environment that can be used for geophysics-based machine learning analysis and processing using GPU resources – an example tutorial that utilises this environment was developed for machine-learning-driven seismic Full Waveform Inversion (FWI)
  • a new JuliaGeo package.

All datasets associated with the project are being published in the NCI Data Catalogue. The datasets in the high-resolution geophysics collections are designed for machine actionability in HPC. FAIR Implementation Profiles (FIPs) will aid machine interoperability. 

The NCI geophysics collections are also being harvested into the ARDC Research Data Australia service. Detailed guides on accessing the data are available on the geophysics community pages within the NCI Documentation website.

To address the lack of consistency in the standards used across geophysical datasets, the project has also initiated an International Geophysics Standards Review.

Who Will Benefit

Researchers and research organisations, peak bodies, infrastructure providers, commercial eInfrastructure providers, governments (state and federal), geophysicists, environmental researchers and data analysts will benefit from the project’s core features:

  • new multi-geophysical research techniques to provide new insights into geophysical properties from the surface of the Earth to the core
  • scaling geophysics to exascale research communities with shared community codes built around high-performance and high-resolution datasets, enabling  geophysicists from different disciplines to collaborate and share their processing and modelling workflows, results and analysis
  • increased confidence in decision making, enabling stakeholders to transparently trace data products back to the source and reproduce workflows.

Before this project, Australian geoscience researchers faced challenges handling high-resolution data. This posed a risk to AuScope’s future amid growing data size and complexity.

Collaborating with NCI enables open and FAIR access to curated datasets, software management, and streamlined processing and analyses, removing technical obstacles. Geophysical survey data processing, which took days or weeks, now happens within minutes at NCI.

This transformative capability has garnered global interest from researchers and industry alike.

Dr Rebecca Farrington, Director of Research Data Systems, AuScope

This project has provided Australian researchers access to a world-class computational and data science digital environment and platform suitable for future research in geoscience.

This includes integrating a diverse range of scientific software, assembling high-quality datasets within an HPC environment, and supporting collaborative research across the country.

This allows researchers to efficiently create workflows tailored to their specific use cases, leading to greater research innovation.

Dr Ben Evans, Deputy Director, HPC and Data Innovation, NCI

The Partners

Key Resources

Further Resources

Contact the ARDC

  • This field is for validation purposes and should be left unchanged.

Timeframe

October 2021 to July 2023

Current Phase

Complete

ARDC Co-investment

$400,000

Project lead

National Computational Infrastructure (NCI)