New Data Platforms Will Help Transform Australian Research

We're excited to announce a new investment of $9.7 million, with $15.5 million in co-investments from collaborating organisations, in 16 new platform projects.
Server room

The Australian Research Data Commons (ARDC) is excited to announce a new investment of $9.7 million, with $15.5 million in co-investments from collaborating organisations, in 16 new platform projects.

In September, the ARDC invited the Australian research community to submit proposals for research-orientated platforms that enable researchers to collect or generate data, analyse that data and produce outputs that could be made Findable, Accessible, Interoperable, and Reusable (FAIR). We were particularly interested in transformative platforms that encouraged radical changes in the way research is conducted and/or dramatically increased the speed of research.

The breadth of new investment across biosecurity, earth and environmental systems, HASS, complex biology and medical and health ensures Australia’s world class research system continues to improve the health of Australians, foster economic growth and support a healthy environment.

Following a careful and thorough review process involving an international panel, the following platforms projects were successful.

Lead: Federation University Australia

The current AgReFed platform supports collaboration and novel insights in Agricultural research, development and policy through improving the discoverability of trusted, reusable and analysis-ready agricultural research data across Australia.

This project will enhance the AgReFed platform to enable a transformation in the way that agricultural researchers collect, describe, and disseminate their research findings.

The outcome will be the ability to search and discover trusted, reusable agriculture-related datasets, workflows and models. This will facilitate data reuse and cross-discipline collaborations for novel research insights and practical applications in policy, reporting and on-ground decision making.

Lead: South Western Sydney Local Health District

We envisage a day when all patients receive evidence-based personalised therapy. Huge amounts of clinical and imaging data in hospital records are currently inaccessible for research. This data can give insights into prognosis, treatment and outcome.

The Cancer Data Network project will establish a nationally agreed capability to link regular treatment (clinical practice) data and clinical trial data, for machine learning analysis with international links. The data analysis is performed wherever the data resides, allowing learning across jurisdictions. This will improve data accessibility and provide governance structures to support data users including clinicians, data scientists, governments and policy makers.

It will expand the scale and accessibility of the existing Australian Computer Assisted Theranostics Network (OzCAT) by linking to the Cancer Alliance Queensland’s QLD Oncology On-Line (QOOL) platform; an early adoption of QOOL in Victoria; and the Cancer Variations (CaVa) project in NSW.

Lead: Queensland University of Technology (QUT)

Social media is a rich source of data that researchers from the social and natural sciences increasingly incorporate into their research.The Digital Data Observatory project aims to:

  • establish a national platform for accessing and analysing dynamic digital data, including existing collections of national interest across Twitter, FlickR, YouTube, Reddit, Instagram and gaming platforms such as Steam.
  • create an interdisciplinary team of researchers and support staff at key locations around Australia to provide transferable skill sets that enable the use of these and future resources in projects of national interest.
  • improve the availability and scalability of existing infrastructures and enable reuse of rich data resources with other nationally significant platforms.

This new national capability for developing, managing, and utilising large-scale research data will ensure equitable (FAIR) access to dynamic data with analytics support across the sector; and enable new forms of research, including usage of multiple social media platforms on topics of global interest, e.g. disinformation on COVID-19.

Lead: Swinburne University of Technology

The Australian Electrophysiology Data Analytics PlaTform (AEDAPT) project will create a national platform for reproducible electrophysiology data analysis and sharing, accessible to all Australian researchers across a wide range of disciplines that conduct electrophysiological research. AEDAPT will adapt an open-source community-driven platform that has already been applied to neuroimaging at the University of Queensland.

Using containers for reproducible analysis pipelines, AEDAPT will make state-of-the-art analysis tools highly accessible by researchers from universities, industry, and clinical settings. By making AEDAPT interoperable with other analysis platforms such as the Australian Imaging Service (AIS), Characterisation Virtual Lab (CVL) and BrainLife, AEDAPT will act as a catalyst for scaling up electrophysiology and multimodal neuroimaging research to large national and international collaborative research projects addressing major challenges such as epilepsy, stroke, traumatic brain injury and dementia.

Lead: Geoscience Australia (GA)

This project aims to address the cost associated with processing, merging and reformatting of bathymetric data in marine modelling and management by adopting and expanding the Global Multi-Resolution Topography Synthesis (GMRT) and becoming a local platform node focused on Australia’s region of marine responsibility. The GMRT is operated by Lamont Doherty Earth Observatory and funded by the US National Science Foundation.

The GMRT-AusSeabed platform will provide seamless, high-quality elevation and bathymetry data to the consistent standards required for oceanographic models and predictions, accelerating research by reducing manual effort, avoiding duplication and removing the barrier of specialised skills needed for accurate bathymetry data. The outputs, suitable for high-performance computing or desktop environments, ensure the service is scalable and accessible to the wider research community while the consistency, resolution, and processing capability delivered by this platform improve the accuracy, validity and reliability of modelling.

GMRT-AusSeabed will integrate with the AusSeabed Data Hub, a cloud-hosted federated platform for open seabed mapping data. This integration will leverage off an already established platform to amplify the benefits to Australian marine science and the broader community.

Lead: The University of Melbourne

The Biosecurity Commons project will operate the world’s first Biosecurity Virtual Lab for use in research and decision-making. In a protected and permissioned environment, researchers will be able to investigate a wide range of questions related to biosecurity risk and response: species/host distribution, impact estimates, transmission methods, pathways, efficacy of control, effort scenarios, optimal surveillance and proof of freedom. The project will leverage the existing EcoCommons architecture, which offers a suite of common approaches to building analytical modelling outputs, as well as pulling together a vast array of geospatial data including climatic, environment, and ecological data.

For the first time, researchers across organisations will be able to securely share and reuse biosecurity data, models and analytics; this is immensely important for nationally cost-shared programs to ensure transparency – building trust and confidence in models and model outputs significantly accelerating research.

Lead: Australian National University (ANU)

Between Earth’s crust and core lies the mantle, a 2,900 km thick convecting layer of hot rock that is the engine driving our dynamic planet. It is responsible for almost all large-scale tectonic and geological activity. Despite this significance, we have little knowledge of the past structure and flow history of Earth’s mantle.

G-ADOPT will develop and support a computational platform for inverse geodynamics. It builds on several recent breakthroughs including (i) a surge in accessible observational datasets; (ii) advances in inversion methods, using sophisticated adjoint techniques, that provide a mechanism for fusing these observations with dynamics, physics and chemistry; and (iii) two novel software libraries, Firedrake and dolfin-adjoint. When combined, these libraries provide a state-of-the-art finite element platform that offers a radical new approach for rigorously integrating geoscientific data with multi-resolution, time-dependent, geodynamical models, through high-performance computing.

This platform will enable robust reconstructions of the history of mantle convection and its impact at Earth’s surface, addressing a fundamental challenge central to the Earth sciences. G-ADOPT will facilitate the generation of unique 4-D datasets of Earth’s evolution, which will be of great value across the geoscientific community, with traceable provenance of input data and model configuration in full compliance with FAIR principles.

Lead: Australian BioCommons

Sequencing DNA at population-scale leads to better understanding of disease causes, diagnosis/detection, and more options for tailored treatments. The Global technologies and standards for sharing human genomics research data project will deliver a services toolbox for improving FAIRness of genomic data at the institutions that hold most human genomes collected for research in Australia. This project will implement standards and APIs from the Global Alliance for Genomic Health, and bring their data holdings into alignment with the global human genome repository (European Genome Phenome Archive). Genomic data from thousands of Australians will be able to be shared securely and responsibly on national and global scales, enabling comparison with very large numbers of other genomes to ensure their full research value can be realised.

The Scalable Governance, Control & Management of FAIR Sensitive Research Data project is a national collaboration to deliver a secure, trusted and scalable environment for data governance, control and management services for data custodians and secure remote data analysis environments for research users. The project will deploy and run a proven technology called the Secure eResearch Platform (SeRP) Software Stack as a managed nationally consistent service. The SeRP service will lower barriers to making sensitive data FAIR by coalescing technology, processes and controls to build trust between data custodians, researchers and their collaborators.

A Community of Practice around SeRP will enable training, knowledge sharing, development and dissemination of best practices and principles. The project will enable both institutional and national cross-jurisdictional research projects that bring together national and global sensitive data assets and collaborations.

Lead: The University of Sydney

The Scientific workflow system for environmental health impact assessments (Air-Health) project will solve the difficulties associated with merging environmental and health datasets using existing computational and data infrastructure in such a way that no technical skills in database systems, network-based remote access or coding will be required by users.

The Air-Health project will develop a scientific workflow system for health impact assessments of air pollution that will enable users to build and extend analyses by linking data acquisition, data transformation, mathematical operations, graphing, statistical analysis and outputs. The tools will streamline desktop and web-based research processes and will be built on open software and cloud services, such as the Apache workflow system Airflow, Collaborative Environment for Scholarly Research and Analysis and AARNet Cloudstor infrastructure.

The platform will use existing application programming interfaces to access environmental and health data from several sources, including state government air pollution monitors, satellites from space agencies, the Bureau of Meteorology, land use/planning agencies, Australian Bureau of Statistics and the Australian Transport Research Cloud. Initially, these data will be standardised and an air quality workbench will be constructed with fully redeployable docker containers for all the data inputs required for air pollution modelling. Then an interface on which air pollution modelling can be performed and corresponding health impacts can be determined for pollutants and populations of interest will be designed.

Lead: University of New South Wales (UNSW)

The Australian Housing Data Analytics Platform (AHDAP) will provide a unique federated platform for the ingestion and management of digital data on housing and the built environment. This platform and its suite of tools will allow rapid multi-scale complex modelling and simulation to address the pressing questions regarding housing provision and sustainability across Australia.

By implementing a common, extensible data model to facilitate the consolidation of Australian housing data, the AHDAP will facilitate research into areas such as housing supply, affordability and diversity, supporting policy decisions that are fair, data-driven, and accurate. The AHDAP will provide researchers and planners with a transformative capability to objectively design and evaluate new policy and practice with regards to the future development of Australia’s urban conurbations, assisting in the driving of economic recovery, social inclusion and resilience across Australia’s $7 trillion housing market.

This project brings together key Federal agencies responsible for researching and monitoring national housing and planning policy, a collaboration that will deliver a sustainable national governance model for Australia’s digital housing assets and provide researchers and policymakers with a prioritised set of nationally harmonised housing data.

Lead: University of Newcastle (UoN)

The Time-Layered Cultural Map of Australia platform (TLCMap) is a software ecosystem meeting the digital mapping needs of humanities and social science researchers. The current TLCMap is unique in offering the means to visualise and interrogate historical and cultural data organised through spatio-temporal coordinates. The TLC Map 2.0 project will enhance the current platform through improved connectivity to relevant external platforms and archives and to national place-name authorities. It will also add new features in the handling of spatial and temporal data.

TLCMap 2.0 will be a no-code or low-code ‘data fusion’ platform allowing researchers to: connect to multiple data sources; use pre-existing or custom-made spatio-temporal tools to analyse and visualise data with minimal programming knowledge; export results to standard analytics tools; and enrich data sources with spatio-temporal tags. It will further lower the barrier to entry for humanities and social science researchers through better pathways in the website, adding tutorials and documentation, and new features.

Lead: Griffith University (GU)

The FishID project aims to transform environmental monitoring of aquatic ecosystems in Australia through automated detection and identification of animals in underwater imagery. The FishID platform will overcome the cost associated with manually processing and extracting data from underwater cameras by creating a user-friendly, public-facing end-to-end pipeline for deep learning detection and automated identification of animals. FishID will deliver a robust and intuitive system for researchers to annotate imagery, train and evaluate deep learning models to accurately detect, identify and count species of interest across coastal and marine ecosystems.

The project will enable a step-change in monitoring efficiency that will improve outcomes across multiple sectors such as:

  • marine environmental monitoring (e.g. environmental assessments, State of Environment reporting)
  • river health monitoring (e.g. Murray-Darling Basin Authority)
  • fisheries assessments (e.g. State Fisheries departments)
  • aquatic habitat restoration (e.g. NGOs investing $130 million in reef restoration over 5 years), some already with streaming cameras suitable for automated image analysis
  • education (e.g. Moreton Bay Live streaming cameras by QLD Environmental Education Centre)
  • tourism (e.g. Great Barrier Reef live streaming cameras with Cairns Aquarium).

Lead: The University of Sydney

The Veterinary and Animal Research Data Commons (VARDC) project builds on the success of VetCompass Australia, which collates electronic patient records (EPRs) from hundreds of veterinary practices nationally and aggregates clinical data for researchers to interrogate.

The Veterinary and Animal Research Data Commons project will transform VetCompass Australia to create a platform that is a single point-of-access to multiple, related systems and data types. It will facilitate collaboration with Australia’s leading three veterinary pathology providers to ingest pathology reports and work with the Australian Imaging Service to host images alongside clinical data.

The VARDC platform will deliver a framework that shows researchers how to design sample-based and big-data studies and provide optimised access to the data. For the first time, pathology providers and clinical researchers will have a comprehensive data set that includes full clinical histories to assist in setting points of reference and standard intervals, assessing survival data and treatment interventions. By improving the accuracy of their research, disease surveillance will be improved as will health outcomes for pets and improved diagnostic efficiency (economic and social benefit to owners). As a world-leading veterinary database, it has advantages over current human public health initiatives in that there are fewer privacy concerns for animals than for humans. This facilitates capacity for development of geospatial disease surveillance and text mining projects that boost human health outcomes.

Lead: The University of Queensland (UQ)

Text analytics in Australian research happen at either a basic, generic level (handled with standard packages) or at a very specialised level with hand-crafted code. The aim of the Australian Text Analytics Platform (ATAP) is to fill the space between these two possibilities, with more powerful tools than contained in the standard packages, but more accessible to a large number of researchers who do not have strong coding skills.

ATAP will transform and accelerate text-based research across disciplines by providing Australian researchers with access to an integrated notebook-based platform for processing and mining text data, with self-service training in text analytics. It will confer greater flexibility and transparency in research workflows by building a community that brings together developers and users of text analytics in an open-source, collaborative environment.