ARDC Leadership Series: Enabling Research Translation
Go to eventabout ARDC Leadership Series: Enabling Research Translation
As part of our Research Software Agenda for Australia, the ARDC is working with the research community to shape better research software in order to recognise it as a first-class output of research. This interview is the eighth in a series about research software engineers in Australia. Each month we talk to a leading research software engineer about their experiences and best-practice tips in creating, sustaining and improving software for research.
Continuing the series, we spoke with the winner of The Australian Bioinformatics and Computational Biology Society (ABACBS) who won the ‘Torsten Seemann’ award for an Outstanding Bioinformatics Software Developer. This award is sponsored by the ARDC in recognition of the value of their software to their community and with a view to promoting further efforts to develop and share bioinformatics methodologies broadly.
Meet Dr Michael Roach, a Research Fellow bioinformatician at Flinders University, Flinders Accelerator for Microbiome Exploration (FAME) who develops and maintains the open-source and freely available bioinformatics tool Hecatomb.
I had no formal training in software engineering and am self-taught. My delve into research software engineering was born out of necessity. During my PhD I had to perform some scripting but it wasn’t until my first post-doc at the Australian Wine Research Institute (AWRI) that I found myself doing a lot more coding. My main project was the Chardonnay genome project. We needed to create a novel pipeline to identify markers for different clones of Chardonnay using our new assembly. My time at the AWRI made all the difference in developing my skills and experience in software development as I was given a lot of freedom to pursue these interests. Since joining the Flinders Accelerator for Microbiome Exploration (FAME) I’ve continued to devote a lot of my energy to research software engineering. Viral metagenomics is still somewhat in its infancy and there is no shortage of opportunities for research software engineering projects in this field.
Hecatomb was originally designed as a tool for read-based annotation of viral sequences. Our collaborator Scott Handley at Washington University in St. Louis created the original pipeline as a collection of bash and R scripts. Shortly after I joined FAME, in January 2021, I took on the role of developing Hecatomb to get it ready for publication. I overhauled most of the pipeline to make it more efficient, robust, and user-friendly. I also included thorough documentation and tutorials in R and Python performing the data analysis. I wasn’t completely happy with manually kicking off a Snakemake pipeline, so I created a command line interface that makes running the pipeline a breeze. It’s still early days for Hecatomb, but we’ve already used it in several publications and many other ongoing projects. We’ve also run a few workshops on Hecatomb and have had a lot of interest from the community.
I was blown away to learn that I had won the ‘Torsten Seemann’ Outstanding Software Developer Award. It was Torsten’s talk at BioInfoSummer 2016 on developing bioinformatics software that motivated me to publish my first pipeline and I’ve been hooked on writing software ever since. I would love to pursue a career in academia as a group leader. There is unfortunately a lot of attrition at my career level, especially given the current state of research funding, but this award will go a long way in helping me to realise this dream.
At FAME we understand the value of awards for students and ECRs. We spend a lot of time and effort helping the lab members polish their posters and talks ahead of conferences and we’ve been quite successful so far. It’s exciting to see ABACBS and the ARDC supporting research software engineering with these awards and I’ll be doing everything I can to promote them in the years to come.
Great question! Our paper “Ten simple rules and a template for creating workflows-as-applications” tackles a lot of these and it ships with example templates (read the article).
The biggest lesson if you’re just starting out in bioinformatics would be to use a workflow manager like Snakemake or Nextflow. They do so much heavy lifting for very little effort. Even when I’m doing a one-off analysis, I’ll make a Snakemake pipeline because it’s easier overall than dealing with bash scripts and it makes sharing your code more FAIR.
Also, don’t be afraid to reach out and ask for help. After Hecatomb, I decided to write the 10 simple rules paper based on what we had done with our command line interface for the Snakemake pipeline. I saw that Titus Brown’s group were already doing the same thing with their pipelines and had much better ways of incorporating some of the features. I also reached out for help with Nextflow and overall the paper was much better because of these awesome collaborators.
I’m a member of a few societies that I would highly recommend. The Australian Bioinformatics And Computational Biology Society (ABACBS) is the premier society for anything biology plus computers. If you use workflow managers often you should consider the BioCommons workflow community.
While not specifically software-focused, software is becoming ever more crucial in all facets of research and the more traditional societies have a growing representation of software creators. I’m the state representative for South Australia for the Australian Society for Biochemistry and Molecular Biology (ASBMB) and I’m on the committee for the Adelaide Protein Group (APG) which is a special interest group of ASBMB. If you like microbes, check out the Australian Society for Microbiology (ASM) and the Joint Academic Microbiology Seminars (JAMS).
You can connect with Michael via GitHub, Twitter, Mastodon, and LinkedIn.
If you’d like to be part of the growing community of research software engineers in Australia, become a member of the RSE Association of Australia and New Zealand (RSE-AUNZ) (it’s free!).
The Astronomical Society of Australia (ASA) has launched the Emerging Leaders in Astronomy Software Development Prize, sponsored by the ARDC, which is now open for nominations with a closing date of Wednesday 15 February 2023. Apply now >
The Ecological Society of Australia has an ARDC-sponsored award for New Developers of Open Source Software in Ecology. Find out about last year’s winner in our recent interview. If you are creating or maintaining research software for ecology. Apply now >
Entries for the 2023 Statistical Society of Australia (SSA) Bill Venables Award for new developers of open source software for data analytics, sponsored by the ARDC, will open in early March 2023. More information >
Entries to the 2023 Australian Research Data Commons Eureka Prize for Excellence in Research Software will open on Monday 13 February. More information >
Start planning your submission now!
Stay tuned for our next interview in the Shaping Research Software series, coming out in March.
Learn more about the ARDC’s Research Software Agenda for Australia.
The ARDC is funded through the National Collaborative Research Infrastructure Strategy (NCRIS) to support national digital research infrastructure for Australian researchers.