Shaping Research Software: An Interview with the dartR Team

We spoke with the the team behind dart R, the R software package enables analysis of large and informative multi-purpose genomic datasets. The team won the 2025 ARDC-sponsored Eureka Prize for Excellence in Research Software.
dart R team in formal attire with their Eureka Prize trophy in front of a red, Australian Museum and Eureka Prize branded backdrop
The dartR team at the 2025 Australian Museum Eureka Prizes award ceremony. From left: Dr Carlo Paciano, Professor Bernd Gruber, Dr Luis Mijangos-Araujo, Emily Stringer, Ching Ching Lau, Dr Diana Robledo Ruiz, Professor Arthur Georges. Photo: Mel Koutchavlis / Australian Museum

This ARDC series aims to drive recognition of research software and its authors. Each month, we talk to leading actors in the research software engineering (RSE) space and share their experience creating, sustaining and improving software for research. 

This month, we spoke with the dartR team, led by the University of Canberra (UC). The dartR team won the prestigious 2025 Eureka Prize for Excellence in Research Software, sponsored by the ARDC. The team involves a diverse team, young and old, male and female, diverse ethnic backgrounds, from various institutions, each with the commitment and energy to make this package a success. 

Professor Bernd Gruber (UC) is the project lead and the original creator of the dartR software, together with Distinguished Professor Arthur Georges (UC). 

The initiative got a boost when Dr Luis Mijangos joined the team as a full-time developer, funded by the ACT Government’s Priority Investment Program. He is now with Diversity Arrays Technology, where he continues to work on dartR. 

Since then, the team has grown to include:

Postdoctoral Research Fellow Dr Emily Stringer (UC) recently joined the team on a full-time basis to assist with preparing workshop materials and developing an eBook on dartR.

What was dartR initially developed for?

The dartR software package grew organically when a small group of researchers realised that the effort they were putting into their analyses could be generalised into a set of functions for a broad audience. 

Recent developments in sequencing and the commercial services available for genotyping using single-nucleotide polymorphisms (SNPs) mean that complex laboratory skills are no longer required to generate large and valuable datasets. 

The pinch point for researchers shifted to the computational capacity to work with these large datasets. dartR removes this impediment for the general biologist or ecologist with questions that can be addressed using SNP datasets. The initiative quickly snowballed as more and more of our colleagues came to appreciate the R scripts that we were making available through dartR and the widely used CRAN repository.

Genetic data can be generated from any organism, making the applicability of dartR vast. It’s used by people working in agriculture, fisheries, pastoralism and captive breeding programs for endangered species, as well as by those studying ecological and evolutionary questions to understand biodiversity. This broad applicability and the growing demand for easy-to-use software are essential ingredients in the success of dartR.

What was the team’s approach?

Early on, we decided not to reinvent the wheel by duplicating the many existing SNP analysis software options, both in R and elsewhere. Instead, dartR is designed to be a bridge, providing simple scripts to connect seamlessly with those third-party tools. This way, we act as a one-stop shop for SNP analysis across the entire software landscape, without duplicating the hard work of others in genomics.

Another key decision was to ensure that, even though dartR grew out of a close partnership with the genotyping company Diversity Arrays Technology, it’s fully accessible to people who generate their SNP datasets through other providers or in their own labs. That means everyone can tap into the power and flexibility of dartR.

What are the applications of dartR that you are aware of?

For us, the most exciting thing about dartR is how far it’s travelled beyond our original idea of ‘population genetics in R’. We now see it used across conservation, agriculture, aquaculture, evolutionary biology and even pathogen genomics – basically anywhere people are trying to make sense of SNP data.

In conservation, dartR helps assess genetic diversity and inbreeding, define management units, measure connectivity and fragmentation, and design translocations and captive-breeding or reintroduction programs for threatened and invasive species. Those analyses feed directly into recovery plans and on-the-ground management decisions.

In agriculture and aquaculture, dartR is used to monitor diversity in breeding lines, manage gene banks and support selection for traits such as productivity and climate resilience.

And then there is teaching and training: dartR has become a backbone of workshops and university courses, letting students and practitioners move from raw SNP data through quality control, structure, relatedness and selection scans within a single, reproducible R workflow.

How do new users get to interact with this tool?

To help new users engage with dartR, we offer several avenues. 

First, our dartR scripts are well documented, and this documentation can be accessed using the R and RStudio help tools. 

Second, we have prepared a series of tutorials structured on sound pedagogical principles to assist in weaving one’s way through the options appropriate to the questions being asked and the analyses that need to be undertaken. These tutorials are available on our website dartR.biomatix.org. AI-generated podcasts cover the theory in audio form – for the nerdy among us who like listening to such things during their morning walks.

We also organise specialist workshops on the applications of SNP data, drawing on nationally and internationally recognised researchers as presenters. These workshops are essential for students and early-career researchers and provide opportunities for them to use {dartR} to streamline their analyses. 

These workshops are an essential avenue for recruiting new developers to the dartR team. For example, Diana Robledo-Ruiz presented at a workshop on her innovative approaches to identifying and filtering sex-linked SNP markers. On publication of her approaches, she joined the dartR team and implemented these, making them available to a wider community.

To help new users engage with dartR, we offer several avenues. First, our dartR scripts are well documented, and this documentation can be accessed using the R and RStudio help tools. Second, we have prepared a series of tutorials structured on sound pedagogical principles to assist in weaving one’s way through the options appropriate to the questions being asked and the analyses that need to be undertaken.

How do you engage with the dartR community?

We have a very active discussion group running on Google Groups, where users can post their questions, suggest solutions and make suggestions for improving the functionality of dartR. Luis Mijangos, in particular, is very active in responding quickly to bug reports and questions, which is something that is highly valued according to our user surveys.

Our GitHub repository is also a well-used avenue for seasoned users to contribute ideas for improving the package.

Users often identify key gaps in dartR’s functionality and we support them by developing new functions to meet their needs. In some cases, users contribute their own functions which we adapt to align with dartR’s formatting and documentation standards before integrating them into the package.

Many of our users are also clients, in the sense that we work with them on a contractual basis to deliver analyses and outcomes relevant to their work. This sort of partnership is a very productive source of new code and capability for the dartR package.

Branding is critical, and we put a lot of effort into the presentation and branding of dartR in workshops, seminars and other fora. The Eureka Prize for Excellence in Research Software, supported by ARDC, gave us a major boost in this regard. Success breeds success, and much of the support we have received comes from entities that see value in being associated with a success story rather than more tangible outcomes.

Users often identify key gaps in dartR’s functionality and we support them by developing new functions to meet their needs. In some cases, users contribute their own functions which we adapt to align with dartR’s formatting and documentation standards before integrating them into the package.

What are the next goals for dartR? How do you see the project evolving in the coming years?

dartR continues to grow as we each encounter new challenges in our research. This is a strength because new functions are being developed in an active research context, ensuring quality. But the downside is that it follows the developers’ research interests, and some critical areas can fall through the cracks. We are actively addressing this by identifying those gaps and recruiting new developers to work in those areas.

We will continue to work through CRAN, not least because it imposes discipline on our programming. But we are hitting limits and have addressed this by moving dartR from a single package to a suite of packages badged as the dartR universe, or dartRverse. This provides clearer avenues for developers and developer teams to hang their hat on specific aspects of dartR with accompanying publications.

We risk being overtaken by the ever-accelerating rate of data generation, and we may soon be working not with tens or hundreds of thousands of SNP markers, but with millions.  Datasets with thousands of individuals, each scored for millions of SNPs, will quickly exceed the capacity of our existing analysis approaches. We are moving early to address this with behind-the-scenes approaches to paging data in and out of memory that will be transparent to the user.

Finally, we would like to formalise our tutorials by creating two eBooks, one on an introduction to dartR and a second on advanced topics in population genetics using dartR. We want to take an innovative approach to this,with a skeletal main text linked to worked examples in Markdown through AI-generated elaborations to AI-generated fireside chats of the relevant theory. The eBook itself can be published but it will link to flexible materials that can be updated continually as the dartR package evolves. 

Emily Stringer has joined the team on contract to work up these materials in collaboration with the other developers. This is an exciting initiative that will lead to greater uptake of dartR in undergraduate and graduate courses in the higher education sector. It is already being used in classwork in Australia and elsewhere.

How does the dartR team collaborate? What practices have helped foster a strong team culture?

We recognise the value of collaborating to make this capability more widely accessible to the broader research community. We meet weekly, and feed off each other’s enthusiasm and ideas. 

  • We have Dr Luis Mijangos, who brings imagination and great enthusiasm to the project, and he takes the lead in interacting with users on the Google Group Forum.
  • We have Dr Carlo Pacioni from the Arthur Rylah Institute in Victoria, who has long worked to bring genetics into play in wildlife management.
  • Dr Diana Robledo-Ruiz and Floriaan Devloo-Delva have specialised in identifying sex-linked SNP markers and exploring their impact on analyses if not adequately accounted for.
  • Prof Oliver Berry heads up the Environomics Futures Initiative in CSIRO and has a background in metapopulation genetics and environmental DNA. Among his contributions is being able to step back and advise the team on how best to present ourselves and our contributions to industry and the community.
  • Dr Renee Catullo from the University of Western Australia is a geneticist with a deep theoretical understanding who has contributed by clarifying theoretical challenges in dartR implementations and proposing solutions.
  • Dr Jesus Castrejon-Figureoa is a physicist who has put his intellectual grunt to work resolving some of the more challenging aspects of novel analysis approaches found only in dartR.
  • In a similar vein, Dr Peter Unmack has a comprehensive set of challenges in freshwater fish conservation that continually present opportunities to improve and expand the capacity of dartR
  • Dr Eric Archer’s work focuses on using molecular tools to address questions of life history, population structure, and taxonomy in cetaceans.

Somehow, this motley crew comes together to weave the magic that is dartR – many in their own time.

dartRverse Workshop 2026

The dartR team is hosting a hands-on workshop on harnessing R for conservation genomics at the ANU Kioloa Coastal Campus and online from 8 to 14 March. It is designed for ecologists, conservation biologists, early-career researchers and students,  and anyone ready to master population genetics in R. Learn more and register by 16 February.

Keep In Touch

You can connect with the dartR team:

If you’d like to be part of a growing community of RSEs in Australia, become a member of RSE-AUNZ – it’s free!

ARDC-Sponsored Eureka Prizes

Sponsored and presented by the ARDC and judged by an independent panel, the Australian Museum Eureka Prize for Excellence in Research Software was awarded for the development, maintenance or extension of software that has enabled significant new scientific research. The other 2 finalists for the 2025 prize were mixOmics and napari.

Read our interviews with the winners and finalists in previous years:

2024

2023

From 2026 to 2028, the ARDC is sponsoring a new Eureka Prize for Excellence in Data Platforms. This new prize will celebrate the development and ongoing maintenance of a data platform that has enabled significant scientific research. Data platforms are online environments where researchers can find, access and use high-quality, well-curated data effectively. They make data as open and readily usable as possible, and closed only where necessary. Like research software, data platforms are key ingredients for modern research. This new award aligns with our efforts to build thematic research data commons, which provide researchers with the data and associated tools and services they need for groundbreaking research and decision-making. Learn more and enter by 7 pm (AEST), Thursday 16 April.

We’d like to thank all those who entered and were involved in the Eureka Prize for Excellence in Research Software. The ARDC continues to sponsor awards for research software and research software engineers in all stages of their careers. Learn about and enter our other sponsored research software awards.

2026 International Research Software Engineering (RSE) Survey Now Open

The Software Sustainability Institute has launched the 2026 International RSE Survey to help better understand what research software engineers need and how the community can be supported. The survey produces an incredibly valuable trove of data that anyone can use to understand the RSE community, including national associations, funders and policymakers. Complete the survey by Friday 20 March.

The ARDC is funded through the National Collaborative Research Infrastructure Strategy (NCRIS) to support national digital research infrastructure for Australian researchers.