Large volumes of geophysical data have been acquired by universities, industry and federal and state government agencies since the 1950s. Making the raw and high-resolution versions of this cross-NCRIS network data accessible and FAIR and integrated with existing government datasets is the challenge of the 2030 Geophysics Collections project.
The project makes rawer, high-resolution versions of AuScope and TERN funded data accessible online, ensuring compliance with FAIR and CARE principles, and integration with existing government datasets at National Computational Infrastructure (NCI).
These datasets will be suitable for programmatic access in high-performance computing environments at NCI, and laying the foundations for more rapid data processing by 2030 for next-generation, scalable, data-intensive computation, including artificial intelligence, machine learning and data assimilation.
The project involves 9 elements.
Geophysical data survey – Conduct a survey of raw and other derivative, associated or other higher-processed geophysical data that could be part of an integrated national high-resolution reference collection. The survey will initially focus on AuScope funded Magnetotelluric (MT) and Passive Seismic (PS) datasets.
Data ingest and organisation – Targeted raw geophysical datasets will be ingested and organised on the NCI filesystem so that they can be (re)processed with computational tools available within the NCI. Derivative versions will be linked back to the source datasets.
Data publication – Geophysical data releases will be discoverable in the NCI data catalogue and catalogue metadata will be structured to enable ‘vertical’ integration between repositories that have a higher level product, but need to reference the rawer data at NCI. Data will also be discoverable in Research Data Australia.
Data repository coordination – Where derivative data products hosted in other repositories need to reference less processed data at NCI, a review of relevant data catalogues will be undertaken to determine if they comply with the FAIR principles. Gaps and inconsistencies will be identified and priority issues targeted for remedial action.
Identifiers – Unique identifiers will be assigned to each version of each dataset including identifiers for the relevant funding agencies and various roles of persons /organisations related to the acquisition, processing and publication of a datasets.
Geophysics data/metadata standards – A review will be undertaken to determine international community-preferred standards for raw and derivative geophysical datasets. Related domain-specific vocabulary standards will be assessed and the vocabulary will be hosted at an ARDC vocab service, or infrastructure fit-for-purpose at NCI.
Scalable computing and data analysis – Establish software suitable for NCI’s computing environments that will focus on how to process raw geophysical data to higher level products. Jupyter analysis notebooks that make use of NCI scalable data analysis software environments.
FAIR implementation profiles – A candidate FAIR Implementation profile, compliant with current international standards, will be developed that could be used for the whole data ecosystem from acquisition to publication.
Completion of projects – Projects are due to be completed by the end of May 2023 and final reports will be published.
Who Will Benefit
Researchers and research organisations, peak bodies, Infrastructure providers, commercial eInfrastructure providers, governments (state and commonwealth), geophysicists, environmental researchers and data analysts will benefit from the project’s core features:
- new multi-geophysical research techniques to provide new insights into geophysical properties from the surface of the Earth to the core
- scaling geophysics to exascale research communities with shared community codes built around high performance, high resolution datasets: enabling geophysicists from different disciplines to collaborate and share their modelling, workflows, results and analysis
- increased confidence in decision-making with stakeholders able to transparently trace from the data products back to the source, and enable reproducibility of workflows.
Our partners are:
- NCI Australia
The 2030 project will create a national, high-resolution geophysical data collection that:
- vertically integrates from source datasets at NCI to derivative products
- enables horizontal integration of remotely-sensed datasets with observational datasets
- links by identifiers, citing roles and organisations involved in any phase of the dataset.
Contact the ARDC
"*" indicates required fields