An ARDC case study

Tens of thousands of life scientists have used the research platform Galaxy over the past decade to analyse large biological datasets, accelerating research and innovation to understand diseases and improve health.

Our genome, or genetic code, defines how we respond to certain diseases, how we age, and how we respond to medications.

The original Human Genome Project took 13 years of work by researchers around the globe and cost close to US$3 billion. For the first time, we could read the complete genetic blueprint for human life. This global effort to sequence the human genome pushed the boundaries of technology and thinking and spawned a new industry of high-throughput genome sequencing with new data analytics methods to match. A new era in medicine began, and great advances were made in the types of technology used to sequence DNA.

Today, a human genome can be analysed in hours thanks to third-generation DNA sequencers matched to the latest data processing algorithms. This, combined with improvements in computer techniques and processing power to map a genome, means the cost of genetic analysis has plummeted. Researchers have seized the opportunity to study valuable genomic data to inform research that helps us understand life on Earth.

However, while the amount of genomic information has exploded, capacity to analyse the data has not kept pace; neither have the skills of researchers to use complex, statistical and mathematical analysis tools.

Accelerating Genomic Research with the Galaxy Research Platform

structure of the Mitogen-activated protein kinase kinase kinase 10 (MAP3K10, X.laevis) kinase domain, visualised on AlphaFold 2.0 running on Galaxy Australia
The structure of the Mitogen-activated protein kinase kinase kinase 10 (MAP3K10, X.laevis) kinase domain, visualised on AlphaFold 2.0 running on Galaxy Australia. AlphaFold is an AI system developed by DeepMind and EMBL’s European Bioinformatics Institute that predicts a protein’s 3D structure from its amino acid sequence.

Research platforms are important for overcoming analysis bottlenecks for researchers. Using the global Galaxy Project, researchers can analyse large biological datasets without needing advanced analytical and software engineering skills, and without having to manage access to compute and storage. An international open-source platform, Galaxy is supported in Australia, Europe, the US and many other countries for the benefit of researchers worldwide.

Galaxy has helped tens of thousands of life scientists analyse large biological datasets found in genomics, proteomics, metabolomics, phenomics, transcriptomics, epigenomics and imaging. As a result, research and innovation are accelerating, from understanding diseases suffered by millions of people each year to mapping the genomes of threatened species to aid conservation.

Researchers can test, evaluate and review their peers’ work within the Galaxy platform, placing the next cure for chronic disease potentially within reach of anyone with the dedication and skills to look for it. Thanks to built-in workflows and comparative genomes, researchers using Galaxy are ready to start analysing their data weeks earlier than they otherwise would be.

Dr Gareth Price is General Manager of Galaxy Australia and Head of Computational Biology at QCIF. He shares just how easy Galaxy is for a non-technical researcher to use: “In Galaxy, each analytical tool has dropdown menus, free-text fields, and check buttons. All that is seamlessly turned into command line executable code in the background, so the user doesn't need to know any software programming to be able to analyse a genome.”

Investing in Research Platforms for Long-term Impact

Galaxy Australia is the local branch of the global Galaxy Project. The ARDC’s investment in the platform — through the ARDC Nectar Research Cloud and the ARDC Platforms program — has seen Galaxy Australia become an essential service for training and analysis in data-intensive research in the life sciences. Our investment ensures that Galaxy Australia maintains the tools, workflows and reference datasets essential for the Australian research community to remain competitive and innovative in the global research system.

Since 2012, Galaxy Australia has been hosted in the ARDC Nectar Research Cloud. The service has seen rapid uptake by researchers and now has over 17,400 users, with 5,000 new users joining in the past 2 years alone and 1.5 million jobs submitted in the past year.

Over the past 2 years, the ARDC and our Nectar node partners increased the compute and storage capacity for the platform, including adding large-memory servers. These servers are game-changers in terms of efficiency, giving Galaxy users instant access to powerful tools such as machine learning, cheminformatic analysis and long-read sequencing. As an example, the genome of Australia’s national floral emblem, the Golden Wattle (Acacia pycnantha), was assembled in less than 24 hours, which is an unprecedented speed for a genome of its size.

To further broaden Galaxy Australia’s capabilities, we have co-invested in the BioCommons BYOD [Bring Your Own Data] Expansion Project. Already underway, the project will increase the number of research communities that can use the BioCommons platform and the types of analyses it can perform. The ARDC co-investment of $2.21 million bolsters the contributions of the Australian BioCommons, The University of Melbourne, Bioplatforms Australia, AARNet, the Australian Access Federation, the National Computational Infrastructure (NCI), the Pawsey Supercomputing Centre, QCIF, Melbourne Bioinformatics, The University of Queensland and the Sydney Informatics Hub.

According to Professor Andrew Lonie, Director of the Australian BioCommons, digital technologies are proving transformational for researchers in the life sciences.

“The enhanced Galaxy Australia platform will position Australia at the forefront of bioinformatics infrastructure and substantially improve Australian researchers’ access to bioinformatics,” said Professor Lonie.

Global Collaboration on Emerging Diseases

The COVID-19 pandemic is the first health crisis in history where researchers have been able to access vast amounts of genomic data. Open data, combined with open analytics and computational infrastructure, has played an essential role in accelerating research to understand and respond to the pandemic.

The development of fast and effective pandemic countermeasures relies on the global research community’s ability to share data and perform fast and reproducible analyses.

In 2020, the global Galaxy platform responded to the urgent need for insight into the SARS-CoV-2 virus, building a truly global, democratised, reproducible and transparent approach to systematically analysing the virus.

Galaxy Australia provided vital research infrastructure for researchers scrambling to understand the new virus sweeping the world. Alongside the computational power provided by the ARDC Nectar Research Cloud, Galaxy Australia also became part of the COVID-19 Acceleration Program of NCI and the Pawsey Supercomputing Centre, giving researchers working on the virus access to high performance computers.

The resources provided by Galaxy made it possible for researchers anywhere in the world to perform their own analyses with the freely available data, analysis pipelines and public computational infrastructure.

A year and a half into the pandemic, when WHO declared the Omicron lineage a variant of concern in November 2021, it asked countries to ‘enhance surveillance and sequencing efforts to better understand circulating SARS-CoV-2 variants’.

At this point, the Galaxy Project had for several months already been operating a free, global, public genome surveillance program based on raw sequencing data deposited in the public databases. Among the countries that had been contributing were South Africa, where the Omicron variant was discovered.

Within 3 days, the Galaxy Project announced that the first view of the mutational pattern of the Omicron lineage was available on the platform. Derived transparently and fully reproducibly from raw sequencing reads, it was immediately available to the global research community via Galaxy.

Ready for the Next Health Crisis

The world continues to grapple with the evolving COVID-19 pandemic. Thanks to the ARDC’s ongoing investment in Galaxy, Australian researchers are improving our global understanding of the virus, along with persistent genetic diseases such as breast cancer, Crohn’s disease and cystic fibrosis. When the next health crisis inevitably occurs, this enduring digital research infrastructure will ensure that Australian researchers are ready to rapidly respond.

Galaxy Australia is an Australian BioCommons service, jointly supported by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS) through the ARDC and Bioplatforms Australia; the Queensland Government’s Research Infrastructure Co-investment Fund; and The University of Melbourne.

Managed by QCIF, Melbourne Bioinformatics and AARNet, Galaxy Australia is underpinned by computational resources provided by AARNet, the ARDC, The University of Melbourne, The University of Queensland, QCIF, National Computational Infrastructure, and the Pawsey Supercomputing Centre.

 

Written by Jo Savill, ARDC. Edited by Mary O’Callaghan. Reviewed by Dr Gareth Price, Dr Christina Hall, Prof Andrew Lonie, Dr Paul Coddington, Carmel Walsh, Dr Andrew Treloar, Andy White, Adelle Coote, Ian Duncan, Rosie Hicks.

Related Resources