As part of our Research Software Agenda for Australia, the ARDC is working with the research community to shape better research software in order to recognise it as a first-class output of research.
This interview is the fourth one in a series about research software engineers. Each month we will talk to a leading research software engineer about their experiences and best-practice tips in creating, sustaining and improving software for research.
Continuing the series, we talked to Johanna Bayer from the University of Melbourne.
What do you do as a research software engineer?
I am a final year PhD candidate at the University of Melbourne. My PhD focuses broadly on projects that improve the reproducibility of research on large neuroimaging datasets.
My first project uses machine learning methods to find biomarkers for mental health conditions, such as depression. I’ve created a method to use normative modelling, which allows us to identify individuals that deviate from the normative trajectory of brain growth. New normative modelling algorithms represent a novel way to parse and describe variant components and allow for individualised predictions.
Another project of my PhD focuses on site or batch effects that emerge when large neuroimaging data sets are created from pooling data sets across studies. I have written a review on methods to correct for those site effects, explaining the underlying mathematical foundation, and have published the relevant software on normative modelling.
Tell us about a project you enjoyed working on
One of my favourite projects was being a Subject Matter Expert on Open Software for the NASA Transform to Open Science (TOPS) program.
NASA declared the year of 2023 as the year of Open Science and launched a sprint to create a curriculum for a course that will be hosted on the platform edX. I joined a group of open science experts from around the globe to create the chapter about Open Software in a sprint of 5 days! The course has 6 chapters on subjects like Licensing/Ownership, Code Management/Quality, Version Control, and Contributing. It was a super fun experience overall, and I particularly liked the project because it showed what can be done in a short amount of time when everyone is committed.
How does your work as an RSE advance research?
I hope that my work as RSE makes a small contribution to overcome the reproducibility crisis in clinical neuroimaging. The reasons for unreproducible results are manifold, but certainly lie in small sample sizes and untransparent methods. My research focuses on individualised predictions, and the effective use of large, pooled datasets, which are publicly available.
In addition, I am trying to disseminate knowledge on how to do open research: I have been selected as ReproNim INCF Fellow. This is a full year Train-the-Trainer fellowship program, jointly sponsored by the International Neuroinformatics Coordinating Facility (INCF). In this program fellows become trainers in FAIR data, computational basics, reproducible workflows and statistical tools. As part of that, I am creating a course on open science, similar to Transform to Open Science (TOPS) but on a smaller scale, for our institute.
When I started out as a PhD student, I remember how lost I felt especially with the organisational part of writing software (folder structure, paths, version control, test coverage, etc.). My aim is thus to share as much of my journey as possible, to help others gain this knowledge early on.
Have you collaborated to extend existing software?
Collaborating on existing software is quite educational. Let’s face it, any kind of public software is probably more standardised, tested and documented than everything that you will write for yourself. So collaborating forces you to adhere to those standards, write tests, etc. And it can teach you quite a bit about how to write good code on the way.
I have contributed to Open Source Projects on various occasions. The most recent one is the INCF Mathworks Toolbox Summer Project, where I contributed to the Physio toolbox for physiological noise correction during fMRI.
What best practices for quality software do you implement in your work?
I have a set of standards that I have collected over the time that I find useful and that I try to adhere to:
- Write tests (unit and integration).
- Fail hard, early. Your tests should be written in a way that errors cannot be dragged through the running of a script or a pipeline undetected (and cause a chain of dependent errors down the way)
- Have at least 2 branches in a project on git: one dev(elopement) branch and one main branch. New stuff NEVER goes into the main.
- One function, one functionality.
- Good naming of your variables
- Write documentation
- NEVER save the output of a script (this forces you to make your work flows 100% reproducible).
- Try to get someone to review your code. Already the thought that someone will have a look at it will make you write better code in the first place.
- Refactor your code if it’s bad or error prone. Even if that means a major re-design of your code structure, you will 100% save more time if you set it up clean once than if you drag messy code through your project life.
- Try to write generic code - so no hard coding of variables, paths etc. Again one of the habits that will take a bit of time at the beginning, but save a lot of time in the end
Which research software communities do you recommend?
The OHBM Open Science Group (OSSIG)
The OSSIG is a global special interest group of the Organisation for Human Brain Mapping, focusing on Open Science. We organise a hackathon and the Open Science Room. We are all volunteers, but the motivation and drive of this group has always been fantastic. You can also follow us on twitter @OhbmOpen.
Even more decentralised than the OSSIG, brainhack.org organises hackathons related to brain data around the year and around the world. During a Brainhack global event several hubs around the world hold their own local hackathon around brainy stuff. Read more about the philosophy of brainhack. Also, keep your eyes peeled for the Australia Asia Hackathon that we are going to host this year, details coming soon!
The Turing Way
Lastly, I want to highlight a non neuro related community: The Turing Way Project is an open source, community driven handbook to reproducible, ethical and collaborative data science. Contributing to the Turing Way Book does not require any coding background and is a great way for beginners to start with contributing to Open Source. The Turing Way holds weekly Community Calls via zoom and FireSide Chats (both of which are usually at rather inconvenient times for the Australian time zone), but you can join the slack to get you started and oriented. View the Slack and all resources.
Keep in Touch
If you’d like to be part of the growing community of research software engineers in Australia, become a member of the RSE Association of Australia and New Zealand (RSE-AUNZ) (it’s free!). Remember that September is the month when RSEs come together, join us for the New Zealand conference and the Asia Australia unconference.
In our previous RSE Story we talked with the winners of the Venables Award, which was established by the Statistical Society of Australia in partnership with the ARDC. The winners will present their software in a public talk, on Thu Sep 8, 1-2pm, you can register now.
Stay tuned for our next interview in the Shaping Research Software series, coming out in November.
Learn more about the ARDC’s Research Software Agenda for Australia.
The ARDC is funded through the National Collaborative Research Infrastructure Strategy (NCRIS) to support national digital research infrastructure for Australian researchers.
Johanna Bayer was interviewed by Dr Paula Andrea Martinez, ARDC. Reviewed by Jo Savill (ARDC).