Advanced microscopes underpin a huge range of critical research, from the ultra-light and strong materials of the future to life-saving treatments for cancer. However, as these instruments have evolved, so have the quantities of data they produce. This leads us to the question: how can this data be stored and managed so that it's findable, accessible, interoperable and reusable (FAIR) for further research?
A new project supported by the Australian Research Data Commons (ARDC), Australian Characterisation Commons at Scale (ACCS), aims to resolve the big-data challenges faced by microscopy facilities around the country by providing tools, best practice guides, specialised training and knowledge sharing. The project is a partnership between 20 Australian institutions*, including universities, research institutions and national research infrastructure facilities.
“Recent innovations in microscopy instruments have come with significant challenges for facilities regarding the amount of data that they can generate, which can amount to terabytes of data per day,” said Dr David Poger, Research Data Manager at Microscopy Australia and ACCS project team member.
To uncover the challenges faced by facilities when managing data, the team surveyed microscopy facilities across Australia and internationally. The findings were published in a first-of-its-kind discovery report “Orchestration and management of data generated by big-data electron microscopy instruments: A Discovery report,” which identifies the trends, tools, procedures and gaps faced by these facilities, and makes recommendations to the ACCS program, and to microscopy facilities across the world, on how to manage these.
Some of the challenges cited by the facilities include:
- How to move large volumes of data efficiently and quickly with limited impact on the institution’s network performance.
- Which software packages and infrastructure can be used to process large data volumes optimally?
- What needs to be stored and for how long given the large data volumes generated?
- Is it possible to automate many or most of the steps from the point of data capture to where data is stored while ensuring they remain accessible?
“Although such challenges may seem basic, their technical solutions identified by the report often required expert knowledge, dedicated resources and software packages that most facilities wouldn’t be able to access or deploy on their own,” Dr Poger elaborated.
The ACCS project leader, Prof. Wojtek Goscinski, Associate Director Monash eResearch Centre, added: “The discovery report provides us with a community roadmap outlining the way microscopy facilities and their users are approaching these challenges and identifying international best practice.”
Collaborating to solve big-data challenges
Through the ACCS project, microscopy facilities across Australia will be working together to solve the big-data challenges they face.
Prof. Antoine van Oijen. NHMRC Leadership Fellow and Director, Molecular Horizons at the University of Wollongong said: “We’ve established a new cryo-electron microscopy facility here at the University of Wollongong and have been working hard during the last couple of years on putting in place not only the microscopy equipment, but also the processes and workflows to manage the large amounts of data we get from these microscopes.
“Being part of the ACCS project is a wonderful opportunity that has allowed us to partner with experts all over the country who are dealing with similar problems, to learn from them and to work together to create solutions that work not just locally, but at a national scale.”
The discovery report has formed the basis for current work that will help microscopy facilities move, store and manage data more efficiently. An academic publication will follow in the coming months.
*The Australian Characterisation Commons at Scale (ACCS) is supported by the ARDC and is a partnership between Monash University, AARNet, Bioplatforms Australia, Flinders University, EMBL Australia, Microscopy Australia, National Imaging Facility, Pawsey Supercomputing Centre, QCIF, RMIT University, Swinburne University, The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, University of New South Wales, University of Queensland, University of South Australia, University of Sydney, University of Western Australia, University of Wollongong.
The ARDC, EMBL Australia, Microscopy Australia, National Imaging Facility, Pawsey Supercomputing Centre are funded through the National Collaborative Research Infrastructure Strategy (NCRIS).