Shaping Research Software: An Interview with Matthew Sainsbury-Dale and Andrew Zammit-Mangion

We spoke with Matthew Sainsbury-Dale and Andrew Zammit-Mangion, who won the 2023 ARDC-sponsored Venables Award for their open-source statistical software for making predictions in soil studies, criminology and many more.
Matthew Sainsbury-Dale and Andrew Zammit-Mangion in Colorado with mountains in the background
Matthew Sainsbury-Dale and Associate Professor Andrew Zammit-Mangion won the ARDC-sponsored 2023 Venables Award for New Developers of Open Source Software for Data Analytics by the Statistical Society of Australia.

As part of our Research Software Agenda for Australia, the ARDC is working with the research community to shape better research software in order to recognise it as a first-class output of research. This interview is part of a series about research software engineers in Australia. Each month we talk to a leading research software engineer about their experiences and best-practice tips in creating, sustaining and improving software for research. 

Continuing the series, we spoke with Matthew Sainsbury-Dale, a PhD candidate at the University of Wollongong’s (UOW) School of Mathematics and Applied Statistics, and his supervisor Associate Professor Andrew Zammit-Mangion. They won the ARDC-sponsored 2023 Venables Award for New Developers of Open Source Software for Data Analytics by the Statistical Society of Australia (SSA) for their work on Fixed Rank Kriging (FRK), an R software package for spatial and spatio-temporal modelling and prediction.

Tell us about your background and academic interests.

Matthew: I’m a PhD candidate at UOW under the supervision of Andrew and Distinguished Professor Noel Cressie. I completed my undergraduate degree in Mathematics at UOW in 2019 and started my PhD in 2020. In 2022, I spent 6 months as a visiting student at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia, working with Associate Professor Raphaël Huser and his team. 

My research interests lie primarily in spatial and spatio-temporal statistics, statistical deep learning (particularly the use of deep learning to facilitate parameter inference), statistics of extremes, and the development of statistical software. 

Andrew: I completed my undergraduate degree in Electrical Engineering at the University of Malta in 2007 and my PhD at the University of Sheffield’s Department of Automatic Control and Systems Engineering in 2012. My PhD introduced me to system identification in engineering, which is essentially statistical modelling for dynamical systems. Over the years, I transitioned into methodological and applied statistics, which is what characterises the majority of my work today.

I specialise in spatio-temporal statistics, which focuses on the modelling and prediction of spatio-temporal data, that is, data that can be referenced by both space and time. My focus in the last year has been on applications that involve remote sensing data (i.e. observations made using instruments on satellites). In recent years, I’ve also been looking at ways AI, particularly deep learning, can be used to facilitate spatio-temporal modelling. We’re finding that deep learning can be very useful by adding flexibility to our models and by helping to estimate model parameters.

What are some of the projects you’ve worked on?

Matthew: I’ve worked on two main projects during my research. My first project involved expanding the software package FRK, which has won me the Venables Award. It’s designed to model and predict from large spatial or spatio-temporal datasets. The primary motivations for this expansion were to cater to non-Gaussian data like count data, which frequently arise in spatial settings, and to upscale the package to allow for complex models that can more accurately model the underlying process in some applications. My second project is on likelihood-free parameter estimation using neural networks.

Andrew: I have worked on quite a few projects, ranging from modelling and predicting irregular warfare to quantifying Antarctica’s contribution to sea level rise from satellite data. The most recent project I worked on considers the estimation of carbon dioxide sources and sinks from satellite data. The framework has been used in a study to support the UN’s Global Stocktake and is the recipient of an award by a section of the American Statistical Association

Noel Cressie, Matthew Sainsbury-Dale and Andrew Zammit-Mangion in Colorado with mountains in the background
Distinguished Professor Noel Cressie, Matthew Sainsbury-Dale (PhD candidate) and Associate Professor Andrew Zammit-Mangion, all from the University of Wollongong, at a meeting in Colorado, US

Tell us more about FRK. How was it conceived and what applications does it have?

Andrew: In a nutshell, FRK can be used to make predictions of a process at spatial locations and time points that have not been directly observed. “FRK” is short for “Fixed Rank Kriging”, a term coined by Noel Cressie and Gardar Johannesson back in 2008. The package involves modelling a spatial or spatio-temporal process as a sum of elementary basis functions and then estimating, and quantifying uncertainty over, the coefficients of these basis functions using data.

When I started working on version 1 of FRK (FRK v1; manuscript) together with Noel in 2015, the methodology had already been around for a few years. However, its implementation requires an understanding of linear algebra, statistics and statistical computing. Noel and I felt there was a need for a software package that anybody with some basic programming skills could use to analyse spatial and spatio-temporal data. This first version took us a few years to complete but has been widely used by the scientific community since its release. I think people like it because it’s flexible, fast and easy to use when compared to existing software. It is also freely available.

However, [the FRK methodology’s] implementation requires an understanding of linear algebra, statistics and statistical computing. Noel and I felt there was a need for a software package that anybody with some basic programming skills could use to analyse spatial and spatio-temporal data.

A/Prof Andrew Zammit-Mangion

But FRK v1 had a few limitations. While it could handle big data, it could not handle “big models” in that it was computationally limited to only a few thousand basis functions. This made the model too inflexible for some applications. Moreover, it could not handle non-Gaussian data, which is very common in spatio-temporal applications. An example of this kind of data is the number of sick people in small areas in an epidemiological study. We felt the need for this functionality, and thus was born version 2, which eventually won us the Venables Award.

In 2020, Matt started his PhD with me on spatio-temporal modelling. I highly value computing skills in PhD students: Matt had them and was keen to develop them further during his studies. He was therefore ideally placed to investigate ways to extend FRK v1 while, importantly, retaining the package’s ease of use and speed. 

The revamp involved 2 main challenges. First, we needed a new estimation methodology that caters for non-Gaussian data. Second, we needed an improved capacity to handle a large number (over 10000) of basis functions. The solution lay in leveraging functionality in the open-source software package TMB (template model builder). TMB allows one to predict non-Gaussian latent random effects using the Laplace approximation. And by constructing the model with sparse precision matrices, it can be used with over 10000 basis functions. Transitioning from FRK v1 to v2 was an arduous process, since we had to weave new code into what was already a substantial code base, all the while ensuring backward compatibility. This project was started in early 2020, and the first version of the paper describing FRK v2 was released in October 2021. The paper describes the methodology underlying FRK v2, gives examples of its use, and provides several use cases on contaminated soil in Nevada, poverty rates in Sydney and crime counts in Chicago. The article has been provisionally accepted for publication in the Journal of Statistical Software (JSS).

Three maps of Chicago with its community areas coloured based on the observed crime, the one-year-ahead predicted crime, and the prediction-interval width.
The observed number of crimes in 76 community areas of Chicago in 2019 (first from left) compared to the one-year-ahead predictions (second from left) with the associated prediction uncertainty (third from left) computed using FRK v2 (Sainsbury-Dale, Zammit-Mangion and Cressie, 2022)

FRK has been applied in various other ways. For example, it has been used for mapping soil moisture by agricultural researchers the Aristotle University of Thessaloniki in Greece, mapping oil and gas production by researchers at Texas Tech University, and, closer to home, for studying coral reef health. Its widespread adoption testifies to its low barrier to entry.

How have you worked together on FRK?

Matthew: FRK v2 was quite a daunting first software project, given its scope and number of moving parts – it was something of a trial by fire. Fortunately, Andrew was able to guide me well through the code development process, and I learned a great deal over the course of the project. In particular, the project taught me the importance of good coding practice (e.g. documenting and cleaning the code, version control) and how these practices can lead to better software and save a lot of headaches. Andrew and I were in regular communication during the project, either during our weekly meetings or via email. I found these regular discussions and feedback very helpful.

The [FRK v2] project taught me the importance of good coding practice (e.g. documenting and cleaning the code, version control) and how these practices can lead to better software and save a lot of headaches.

Matthew Sainsbury-Dale

Andrew: It’s been great working with Matt on the FRK software package. He had a challenging project: to develop the code required to solve FRK v1’s limitations while understanding in depth an existing and mature codebase and considering how it can be modified without breaking backward compatibility. And we must not forget about the underlying statistical methodology, which is sophisticated and comes with its own mathematical challenges that need to be sorted out prior to software implementation. The input of Noel, who is the third member of this “FRK team”, was particularly valuable in this respect.

It’s fair to say it was intense. We regularly discussed how to change things in the software, down to the point on what we should call certain variable names, with most of this done during the pandemic. We collaborated publicly via Github. Matt did most of the coding work, and I provided guidance when needed, but the version 2 code of FRK is very much his own. 

I did a lot of testing to make sure the software is working as it should from an end user’s perspective, and I’m very happy with the result. Considering that this was the first serious software project Matt has worked on, I’m also very impressed with it. It was also the first experience for me working with someone so closely on a piece of software that we both knew could have a tangible impact. It was very rewarding and is something I’d like to do again.

What does the Venables Award mean to you?

Matthew: It’s great to have recognition for software. So much work is required to produce high-quality, well-documented software, but this work is often underappreciated since research papers are the primary metric by which researchers are measured. For researchers of statistical methodology, there’s of course an indirect incentive to produce high-quality software: the availability of software for a given methodology affects its impact and, hence, the number of citations the research paper will receive. However, it’s also important to directly encourage and recognise researchers for their contributions to software development. I’m very proud to receive the Venables Award, which certainly encourages me to continue to develop statistical software. 

Andrew: I see software as a very important component of research in statistical methodology. A problem currently faced by researchers is that software is not always seen or recognised as research. But anyone who has coded a big statistical software package will tell you that often a lot of research goes into producing good-quality software. Software is also a research enabler: FRK, for one, has been used as a component in new statistical methodology as well as to deliver insights in various scientific disciplines. The Venables Award is a great initiative as it supports new developers in seeing the value of software. All too often in statistics, academia’s focus is on novel ideas rather than novel tools – the paucity of academic journals based on research software is a strong indicator of this. But this is changing slowly, and the Venables Award is an incentive that accelerates this change. 

Personally, I feel very privileged to be given this award in a community full of excellent software developers who’ve also inspired and supported my work over the years.

What communities are you part of and do you recommend?

Matthew: I’m involved with the Statistical Society of Australia (SSA). Their weekly newsletter helps me to keep up to date with statistics events and news in Australia.  

Andrew: I am a member of many societies, including the SSA, the American Statistical Association, the International Statistical Institute, and the International Society for Bayesian Analysis. All these societies have newsletters or forums that keep me up to date with ongoing developments in statistics and data science, including software-related news. I also regularly follow the Posit Blog, which keeps me up to date with developments in the wider R ecosystem.

Keep In Touch

You can connect with Matthew via LinkedIn and Github.

You can connect with Andrew via LinkedIn, Github and Twitter. You can also learn more about him and his work on his personal website, his fellowship project website, and through his UOW profile.

If you’d like to be part of the growing community of research software engineers in Australia, become a member of the RSE Association of Australia and New Zealand (RSE-AUNZ) – it’s free! 

Research Software Award Updates

The winner of this year’s ARDC Award for New Developers of Open Source Software in Ecology was announced at the 2023 Ecological Society of Australia Conference. It was awarded to CSIRO’s Dr Roozbeh Valavi for the R package blockCV, developed at the University of Melbourne’s Quantitative and Applied Ecology Group. Congratulations, Roozbeh!

Presented by the ESA, the award supports efforts to develop and share methodology, models and data in Australia’s ecological communities. Learn more on the ESA website

Learn more about the ARDC’s Research Software Agenda for Australia.

The ARDC is funded through the National Collaborative Research Infrastructure Strategy (NCRIS) to support national digital research infrastructure for Australian researchers.

Related Projects