This ARDC series aims to drive recognition of research software and its authors. Each month, we talk to leading actors in the research software engineer (RSE) space and share their experience creating, sustaining and improving software for research.
Continuing the series, we talked with the winners of the 2024 Venables Award for New Developers of Open Source Software for Data Analytics, which was established by the Statistical Society of Australia in partnership with the ARDC. Meet the winners of the shared first prize, Dr Alex Lee and Dr Rob Moss, who have each made significant contributions in their respective areas.
Alex Lee is currently a Machine Learning Engineer at Beyond Blue and Honorary Research Fellow in Cancer Services and Data Science in the Department of General Practice and Primary Care at the University of Melbourne. For his research, he has used large-scale linked primary care data to understand how cancer patients interact with the health system. His work includes developing predictive models for early detection for cancer. He completed his PhD in mathematics, and he is interested in applying a variety of statistical and machine learning techniques. Working mainly in Python, he also contributes to the development of open source tools for data scientists to work efficiently.
Rob Moss is a Senior Research Fellow in the Infectious Disease Dynamics Unit at the Melbourne School of Population and Global Health. His research focuses on predicting and mitigating the burden of infectious disease epidemics through the use of mathematical / computational modelling and Bayesian inference. This includes:
- using scenario modelling to inform recommendations for public health interventions
- synthesising models and surveillance data to generate near-real-time epidemic forecasts.
He is a proponent of reproducible research, open access, and free and open source software, and has produced open source software packages through his research activities.
How did you get into research software engineering?
Alex: My background is in maths and physics, so I’m not a software engineer by training. A friend of mine once described programming as like ‘painting with maths’, which I think is a nice description. It’s satisfying to be able to turn mathematical ideas into a product that someone can use in their own work. Almost all the data science work I have done over the past 10 years involves a large amount of coding, which has led me to learn better software development and coding practices. I think good practices in coding and writing software are an important but often undervalued part of research. This has also been part of the motivation for committing to a project such as this.
Rob: I originally studied pure maths and software engineering. In pure maths we would often prove various statements, and there’s an aesthetic element of coming up with ’elegant’ proofs. This carried over into writing code and finding out how to best express the core idea that the code is trying to solve. I’ve always loved this kind of exploration. Since graduating, my research has involved building computational models, and I’ve continued trying to write code in ways that best express the underlying ideas and concepts. This often results in relatively generic frameworks or tools, which I then apply to my particular research problems. So being an RSE (research software engineer) is a fundamental part of my research and an active area of learning.
Tell us how your Python packages whereabouts
and pypfilt
came to be. What specific needs do they aim to address?
Alex: In 2020 I was part of Victorian Centre for Data Insights (VCDI), a data science consultancy within the Victorian State Government, where I collaborated with different government departments on various data science projects. Starting in the pandemic I was working on a few COVID-19 projects with the Department of Health. One problem they were interested in was how to identify high-risk locations using data from contact tracing notes. A key part of this is record linkage and geocoding, where a street address is standardised and converted to a set of coordinates. I was aware of some work by researchers from ANU in this area, and this seeded the development of what would become whereabouts
. A few of us ended up developing a system that worked quite well for matching COVID cases to locations. I later developed whereabouts
, which is different from our original system but solves a similar problem, namely geocoding and record linkage. I couldn’t find any good open source geocoding packages for Python – most people rely on commercial tools such as Google’s Geocoding API. But in Australia, we have high-quality publicly available address data, and so there shouldn’t be any reason that such open source tools are not more widely available.
Rob: In 2015 I began working on real-time forecasts of seasonal influenza in Australia. This involved fitting mathematical models of disease to laboratory testing data and simulating the models to predict the laboratory data one to 4 weeks into the future. Standard MCMC methods were potentially too slow for real-time forecasting, so we used Sequential Monte Carlo methods instead. The pypfilt
package resulted from my efforts to separate the infectious disease details from the method used to fit the models to the data. I eventually realised that pypfilt
might be useful to other people and began making it more user-friendly. The aim is to make it easy to fit models to time-series data, and to update these fits in real-time when new data are obtained.
The
Dr Rob Moss, winner of the 2024 Venables Award for New Developers of Open Source Software for Data Analyticspypfilt
package resulted from my efforts to separate the infectious disease details from the method used to fit the models to the data. I eventually realised thatpypfilt
might be useful to other people and began making it more user-friendly. The aim is to make it easy to fit models to time-series data, and to update these fits in real-time when new data are obtained.
How do you expand your user base? What strategies do you want to try?
Alex: I have spoken and worked with a few people – in academia and business – who use geospatial analysis in their work. This has led to further interest in the tool. I have also been contacted directly by people who are interested in the tool for their problems, both in Australia and overseas. Ideally, I will be able to further build the tool using feedback from these users. I also first presented the work at PyCon in Adelaide in 2023, which was a good way to promote it directly to the open source community.
Rob: It’s difficult, perhaps especially because I’m not good at (or interested in) marketing. To date, it’s mostly been word of mouth. I often end research talks by mentioning pypfilt
and highlighting that it’s freely available (open source). I recently published an article about pypfilt
in the Journal of Open Source Software, which might attract some more users. Publicity in articles and interviews such as this are also great opportunities!
What best practices do you implement for quality software?
Alex: I’m not a software engineer by training but have been coding in Python since 2016. I like to keep things simple: plenty of clear comments, consistent notation which is PEP8-compliant, and incorporating clear documentation and unit tests so that any changes don’t break the existing functionality. I work on the assumption that users of open source packages are impatient (as I am), and so it should be clearly explained what the purpose of the software is. It should also be easy to install, and there should be simple examples to get you started. I’ve tried to incorporate all these in whereabouts
, but I am still very much learning about what does and doesn’t work best.
I like to keep things simple: plenty of clear comments, consistent notation which is PEP8-compliant, and incorporating clear documentation and unit tests so that any changes don’t break the existing functionality.
Dr Alex Lee, winner of the 2024 Venables Award for New Developers of Open Source Software for Data Analytics
Rob: In brief, I use version control (git), write lots of tests (pytest), and use continuous integration to detect when a change to the code causes a test to fail. The current version of pypfilt
contains about 9,300 lines of code and about 5,500 lines of tests. I use tests to check that individual functions behave as expected, to generate multiple sets of forecasts and verify that the results are reproducible, and even to demonstrate how to use pypfilt
. For example, the code in the online Getting Started tutorial is a single test that generates all of the figures in the tutorial. I find that having these kinds of examples is particularly helpful, especially for me when I haven’t used a particular feature for a while and can’t remember exactly how it works.
What’s next for whereabouts
and pypfilt
?
Alex: I’ve been working with a few different people to incorporate additional features into the package and test it in novel contexts. I’d like to grow the community and extend it to work with data from other countries. Interestingly I’ve found that, unlike Australia, most countries don’t have good quality open address data. For example the UK address data has to be purchased. However, there are publicly available alternatives such as OpenAddresses and OpenStreetMap. As with many data science projects, a lot of the big challenges in developing a tool such as this are with the data itself. As a result, I spend a large amount of time cleaning the data and validating it to, for example, check for internal consistency.
Rob: I have an approximately 1,600 line TODO file with ideas for improving the user experience, adding new features and inference methods, etc. There are many interesting questions and problems in the field of Sequential Monte Carlo methods, and I think pypfilt
can be a useful tool for investigating some of these research avenues. I want to keep growing it into an increasingly useful toolbox of model-fitting methods.
Which research software communities do you recommend?
Alex: The RSE Association of Australia and New Zealand (RSE-AUNZ) is one that I have recently become aware of. I think that software and infrastructure are critical parts of science but unfortunately often undervalued in academia, so I am grateful that there are awards such as this one to raise the profile of such work. Judicious use of social media is a good way to find people who are developing software that supports research. Bluesky and LinkedIn have strong communities of open source developers. Some names that come to mind (not all software developers) include Hamel Husain, Danielle Navarro, Frank Harrel, Simon Willison, Richard McElreath and Hadley Wickham. The open source community is very active and positive, made up of people developing software tooling to improve the way research is done. Seeing what others are producing, reading their code and blog posts and viewing their presentations, have all helped me improve my own work.
Rob: The RSE Association of Australia and New Zealand (RSE-AUNZ) is the first community that comes to mind when I think about RSE. With a specific focus on infectious diseases, I also have to mention the Australian Consortium for Epidemic Forecasting and Analytics (ACEFA). Finally, Git Is My Lab Book is a set of online training materials that I’ve been developing in collaboration with peers across our national and international infectious disease modelling networks.
Keep In Touch
You can connect with Alex via GitHub, his University of Melbourne profile, and his personal website.
You can connect with Rob via GitHub, his University of Melbourne profile, his personal website, and Mastodon.
If you’d like to be part of a growing community of RSEs in Australia, become a member of RSE-AUNZ – it’s free!
Hear More about whereabouts
and pypfilt
The Statistical Computing and Visualisation section of Statistical Society of Australia is proudly presenting the Venables Award webinar. Join the webinar to hear more about whereabouts
and pypfilt
.
- Date: 18 March 2025
- Time: 12 noon to 1 pm (AEDT)
- Location: online
Eureka Prize for Excellence in Research Software Open
The ARDC is proud to sponsor awards for research software and research software engineers in all stages of their careers. The goal of the awards is to strengthen the recognition of research software and those who develop and maintain it as being vital to modern research.
The ARDC continues to sponsor research software awards for 2025, including the Australian Museum Eureka Prize for Excellence in Research Software. Sponsored and presented by the ARDC, it is awarded for the development, maintenance or extension of software that has enabled significant new scientific research.
2025 timeline
Entries open | Tuesday 11 February |
Entries close | 7 pm (AEST), Monday 14 April |
Finalists announced | Thursday 31 July |
Winners announced at the 2025 Eureka Prizes Award Ceremony | Wednesday 3 September |
Who should apply
Developers and maintainers of research software
Previous winners and finalists
2024
- Winner: Professor Gordon Smyth
- Finalists:
MiniZinc
,MRtrix3
2023
- Winners: Dr Minh Bui and Professor Robert Lanfear
- Finalists:
GPlates
,mixOmics
Further information
Learn more about this award on the Australian Museum website.
The ARDC is funded through the National Collaborative Research Infrastructure Strategy (NCRIS) to support national digital research infrastructure for Australian researchers.