Large volumes of geophysical data have been acquired by universities, industry and federal and state government agencies since the 1950s. Making the raw and high-resolution versions of this cross-NCRIS network data FAIR and integrated with existing government datasets is the challenge of the 2030 Geophysics Collections project.
The project makes rawer, high-resolution versions of AuScope-funded magnetotelluric (MT) and passive seismic (PS) data accessible online, compliant with the FAIR and CARE principles, and integrated with existing government datasets at the National Computational Infrastructure (NCI) and other sites, including TERN.
These datasets are suitable for programmatic access in high-performance computing environments at NCI. They lay the foundations for more rapid data processing by 2030 for next-generation, scalable and data-intensive computation, including data assimilation and computation using artificial intelligence and machine learning.
The project involves 9 elements:
A survey has been conducted of raw and other derivative, associated or other higher-processed geophysical data that could be part of an integrated national high-resolution reference collection. The survey initially focused on the AuScope-funded Magnetotelluric (MT), Passive Seismic (PS) and Distributed Acoustic Sensing (DAS) datasets.
Targeted raw geophysical datasets have been ingested and organised on the NCI filesystem so that they can be (re)processed with computational tools available within the NCI. Derivative versions have been linked back to the source datasets.
Geophysical data releases are now discoverable in the NCI Data Catalogue and catalogue metadata have been structured to enable ‘vertical’ integration between repositories that have a higher-level product but need to reference the rawer data at NCI. The data is also being made discoverable through the ARDC Research Data Australia service.
Where derivative data products hosted in other repositories need to reference less processed data at NCI, a review of relevant data catalogues has been undertaken to determine if they comply with the FAIR principles. Gaps and inconsistencies have been identified and priority issues targeted for remedial action.
Globally Unique Persistent Resolvable Identifiers (GUPRIs) have been assigned to each version of each dataset to support data citation and reproducibility. International standard identifiers can assist in disambiguating the people and organisations related to the acquisition, processing, publication and funding of the geophysics datasets.
A review has been undertaken to determine international community-preferred standards for raw and derivative geophysical datasets. Related domain-specific vocabulary standards will be assessed, and where relevant, the vocabulary will be hosted on the ARDC Research Vocabularies Australia service.
Learn more about the International Geophysical Standards Review.
Software suitable for NCI’s computing environments has been established with a focus on how to process raw geophysical data into higher-level products. Jupyter analysis notebook tutorials that make use of NCI’s scalable data analysis software environments have been developed.
Candidate FAIR Implementation Profiles (FIPs), compliant with current international standards, are being developed for use for the whole data ecosystem from acquisition to publication.
Projects are due to be completed by the end of July 2023, and final reports will be published.
Who Will Benefit
Researchers and research organisations, peak bodies, Infrastructure providers, commercial eInfrastructure providers, governments (state and federal), geophysicists, environmental researchers and data analysts will benefit from the project’s core features:
- new multi-geophysical research techniques to provide new insights into geophysical properties from the surface of the Earth to the core
- scaling geophysics to exascale research communities with shared community codes built around high-performance and high-resolution datasets, enabling geophysicists from different disciplines to collaborate and share their processing and modelling workflows, results and analysis
- increased confidence in decision making, enabling stakeholders to transparently trace data products back to the source and reproduce workflows.
The project has created the foundations for a national, high-resolution geophysical data collection that:
- vertically integrates source datasets at NCI to derivative products hosted elsewhere
- enables horizontal integration of remotely sensed and other geophysical datasets hosted at NCI with observational datasets hosted at TERN or elsewhere
- links citing roles and organisations involved in each phase of the dataset using identifiers.
The project has created the NCI Geophysics Specialised Environments, which provides access to data, tools and online high-end environments for both cloud and high-performance computing (HPC). Resources include:
- the NCI-geophysics module, which integrates Python, Julia and R environments together with thousands of pre-built geophysics, geoscience and data science-related libraries – this module can be used for batch jobs on Gadi as well as through JupyterLab or Virtual Desktop apps on NCI’s Australian Research Environment
- specialised geophysics software for various techniques along with notebooks and how-to instructions, including Magnetotellurics (MT), Seismic, Airborne Electromagnetics and multi-physics analysis software.
- NCI’s AI/ML environment that can be used for geophysics-based machine learning analysis and processing using GPU resources – an example tutorial that utilises this environment was developed for machine-learning-driven seismic Full Waveform Inversion (FWI)
- a new JuliaGeo package.
All datasets associated with the project are being published in the NCI Data Catalogue. The datasets in the high-resolution geophysics collections are designed for machine actionability in HPC. FAIR Implementation Profiles (FIPs) will aid machine interoperability.
The NCI geophysics collections are also being harvested into the ARDC Research Data Australia service. Detailed guides on accessing the data are available on the geophysics community pages within the NCI Documentation website.
To address the lack of consistency in the standards used across geophysical datasets, the project has also initiated an International Geophysics Standards Review.
- NCI AI/ML Environment
- NCI Specialised Environment – Geophysics
- NCI Geophysics Community
- NCI Geophysics Collections:
- AuScope (2023): AuScope Magnetotellurics (MT) Collection. v1. NCI Australia.dataset. https://dx.doi.org/10.25914/mtjg-jp22
- AuScope; Research School of Earth Sciences (RSES), Australian National University (2023): AuScope Distributed Acoustic Sensing (DAS) Collection. v1. NCI Australia.dataset. https://dx.doi.org/10.25914/zr9f-1e98
- Cudahy, T. et. al. (2023): National ASTER Map of Australia. v1. NCI Australia.dataset. https://dx.doi.org/10.25914/5f224f36ec890
- Research School of Earth Sciences (RSES), The Australian National University (2023): AusPass Passive Seismic Collection. NCI Australia.dataset. https://dx.doi.org/10.25914/zyay-2g34
- “A path towards reproducible magnetotelluric (MT) time series processing on HPC”, presented at Australasian Leadership Computing Symposium (ALCS) 2023
- “The Known Knowns, the Known Unknowns and the Unknown Unknowns of Geophysics Data Processing in 2030”, presented at European Geoscience Union (EGU) General Assembly 2022
- “Building a National High-Resolution Geophysics Reference Collection for 2030 Computation”, presented at Australasian Exploration Geoscience Conference (AEGC) 2023
- “Using 2030 computational techniques to unleash the untapped potential of existing geophysical datasets in mineral exploration: Opportunities and challenges”, presented at the AEGC 2023 workshop “Scaling MT acquisition, processing, interpretation, and people”.