Digital Research Training in Geoscience and Astronomy
From the Earth to the stars, developments in digital research are requiring geoscientists and astronomy to learn new data skills.
This webinar summarises 2 presentations to the Australian eResearch Skilled Workforce Summit held in September 2019. The presentations focus on addressing data management and skill development challenges in Earth sciences and astronomy.
Watch the webinar recording, and read the summaries of the presentations below.
Presenters
- Dr Lesley Wyborn (NCI, ARDC, AuScope) covers meeting FAIR Data skills challenges in the geosciences including interoperability and reusability, and the AGU-led enabling FAIR Data project. She also talks about the need for hybrid skills bringing computational and data expertise together and work undertaken to identify the skills sets needed.
- Dr Robert Shen (Astronomy Australia Ltd) explains how ADACS (Astronomy, data and computing services) was established to provide focused data services and infrastructure to astronomy as well as expert training.
Leslie Wyborn: Meeting FAIR Data Skills Challenges in Earth’s Energy Sciences
Leslie Wyborn discussed the FAIR data principles (Findable, Accessible, Interoperable, Reusable) and the challenges faced by Earth sciences in implementing them effectively, particularly focusing on the Interoperability (I) and Reusability (R) aspects.
Key Points:
- The driver for FAIR data changes originated from the American Geophysical Union (AGU), emphasising Earth and Space Sciences as world heritage needing proper data documentation and credit.
- Historically, scientific data was analog and published in papers. Since the advent of computers (1940s onward), data volumes and resolution have exploded, making traditional publishing of all data infeasible.
- The “dark ages” of scientific data (1970s) saw the loss of much valuable data due to inadequate sharing practices.
- FAIR principles require data to be:
- Assigned persistent identifiers (DOIs)
- Registered with rich metadata
- Machine-actionable, not just human-readable
- Linked to repositories rather than supplemental files
- Collaboration involved publishers, repositories, and researchers, requiring common policies and workflows to standardise data submission processes.
- A major bottleneck is the lack of hybrid professionals who combine domain expertise with data/computational skills to make data FAIR-compliant, especially for machine actionability.
- Training gaps exist: librarians excel at metadata and identifiers, computer scientists excel at protocols and machine actionability, but hybrids who can bridge both domains are scarce.
- The concept of the “X Factor” or hybrid scientist was introduced, referring to professionals with dual expertise in computational/data science and domain science.
- A 2012 pilot study at Geoscience Australia assessed skills needed in such hybrids, revealing:
- Scientists had stronger backgrounds in applied math, geophysics, and computational modelling.
- Technical teams had stronger computer science and software engineering skills but also learned domain knowledge.
- Behavioural traits included intuition, ecological nonlinear thinking, willingness to adopt new methods, and strong communication.
- Organisational support is critical, including recognition and fostering of hybrid roles.
- The greatest challenge remains making data machine actionable, requiring community vocabularies, standards, and collaborative efforts across disciplines.
- Many Earth science subfields, like seismology, are advanced in FAIR implementations, whereas others, like geochemistry, lag behind due to a lack of standards and community agreement.
- Overall, the Earth sciences face a significant skills and cultural challenge to fully realise FAIR data benefits.
Robert Shen: Leveraging ADACS to Support Astronomy Skill Development
Robert Shen presented on ADACS (Astronomy Data and Computing Services), a national initiative supporting astronomy data skills and infrastructure in Australia.
Key Points:
- ADACS was established following recommendations from the 2016 Decadal Plan for Australian Astronomy, which prioritised world-class HPC and software capabilities for astronomy.
- The Computing Infrastructure Planning Working Group recommended:
- Establishing ADACS to provide training, support, and expertise.
- Investing $7-15 million every five years for data and computing infrastructure.
- ADACS officially began in 2017, hosting two nodes in Melbourne (Swinburne University) and Perth (Curtin University).
- ADACS offers 3 main service components:
- Training and Education:
- Face-to-face and online courses tailored specifically for astronomy data challenges (e.g., Python use focused on astronomy datasets).
- Partnerships with industry and research centres for specialised training (e.g., GPU programming with CUDA).
- Hackathons and outreach events combining data science and practical astronomy (e.g., ‘Cloudy Skies’ event using NASA Juno mission data).
- Online learning platforms, YouTube videos, and GitHub repositories with training materials.
- Internship programs connecting PhD students with real data computing problems, supporting career development.
- National Support:
- Embedding data computing experts within research teams for short-term (up to 6 months) and long-term support.
- Services include data pipeline optimisation, HPC code enhancement, and data management planning.
- Collaboration with major publishers (e.g., PASA) to facilitate data deposition.
- National Infrastructure:
- Ensuring availability of data storage and HPC resources dedicated to astronomy.
- Partnership with the national HPC provider, NCI, to allocate millions of CPU hours annually to astronomy researchers.
- Developing cloud computing resources to meet growing community needs.
- Future plans include investing over $5 million to enhance data centres, particularly for projects like gravitational wave data management.
- ADACS also focuses on industry engagement, commercialising astronomy technologies and integrating commercial cloud computing resources.
- Robert emphasised the critical role of the ADACS technical team in the success of these initiatives.
Conclusion
The webinar emphasised the evolving landscape of research data management and the critical need for hybrid skills combining domain knowledge and computational expertise. While Earth sciences grapple with FAIR data challenges, astronomy offers a model of coordinated infrastructure and tailored training via ADACS. Sustained investment in skills development, infrastructure, and community engagement is essential to unlock the full potential of large-scale scientific data.