This ARDC series aims to drive recognition of research software and its authors. Each month, we talk to leading actors in the Research Software Engineering (RSE) space and share their experience creating, sustaining and improving software for research.
This month we talked with Dr Emi Tanaka, an Applied Statistician at the Australian National University (ANU) Research School of Finance, Actuarial Studies and Statistics. She is also the ANU lead for the Analytics for the Australian Grains Industry (AAGI) project partnership. Her primary interest is to develop impactful methods and tools that can be readily used by practitioners. She is a strong advocate of open science and an avid research software engineer.
How do you use your skills in statistics and software development to help solve problems in the Australian grains industry?
Analytics for the Australian Grains Industry (AAGI) is a relatively new project partnership for ANU, having officially commenced in December 2024. We are still in the early stages of the AAGI project, with the last of our three postdoctoral researchers joining the team in July. So far, we’ve run several training sessions for the postdocs, leveraging my experience
with the Statistics for the Australian Grains Industry (SAGI) project – AAGI’s predecessor – to ensure best statistical practices in plant improvement programs. Since my time with SAGI, I have gained additional skills in software development, machine learning, and large language models. Integrated with my statistical expertise, these are set to bring about exciting developments. I believe that it’s in this interdisciplinary space that real innovations have the potential to flourish.
We also make the most of technology, using version control systems like Git and GitHub to track projects and enhance collaboration. Tools like Zulip will allow us to casually discuss the math and code within the team, which traditional communication methods like email often struggle with. Moreover, we’re focused on developing software packages in the R programming language and creating web applications to promote better analytical practices. I’m enthusiastic about the strides we’ll make in the next few years! To learn more, visit the ANU-AAGI website.
Could you describe a bit of your work as a team lead?
For most tasks, I typically take the initiative to self-assign and push forward in a direction that I believe is both responsible and beneficial for the group. This approach is guided by my strong conviction that, as data professionals or data academics, we should exemplify high standards in data-driven decision-making and data practices. I dedicate significant effort to developing and testing infrastructure with a focus on long-term sustainability. As a result of these efforts, our workshop materials are now version-controlled and publicly accessible on our organisational GitHub profile.
While the infrastructure isn’t perfect and needs to evolve in response to collective needs and team input, improvement is an iterative process. I believe that taking the initial steps is crucial for progress. My goal is to build upon what we have and ensure that our infrastructure can sustainably operate beyond any single individual.
I’m grateful for colleagues like Jiajia Li, who are dedicated to making our workshop materials accessible on our organisational GitHub profile. Ultimately, infrastructure is only valuable if people use it, and I believe it should be designed such that infrastructure adapts to the benefit of people, rather than people adapting to the benefit of infrastructure. I also want to expand our activities to create an environment where early-career researchers can thrive, whether through seminars for scientific discussion, hacky hours for coding skills enhancement, or social events for team bonding. So still a lot more to do!
Ultimately, infrastructure is only valuable if people use it, and I believe it should be designed such that infrastructure adapts to the benefit of people, rather than people adapting to the benefit of infrastructure.

Tell us about your team, their backgrounds and projects.
For ANU-AAGI, the team includes esteemed academic statisticians like Alan Welsh and Francis Hui, recipients of the prestigious Australian Academy of Science Hannan Medal and Chris Heyde Medal, respectively. They contribute substantial rigor and both theoretical and practical expertise in statistics, particularly in mixed models (also known as multilevel models, hierarchical models, and random effects), which are widely used in the biological sciences. Leading ANU-AAGI, I serve as the connector between statistics, computing, and plant sciences, uniting the team’s efforts to contribute to AAGI.
The most exciting aspect of our team is the early-career researchers and students, who inject energy into propelling ideas, challenging conventions, and embracing new approaches. Our postdoctoral researchers – Weihao (Patrick) Li, Fonti Kar, and Yidi Deng – are set to undertake projects in machine learning and computer vision, experimental design and statistical modeling, and statistical bioinformatics, respectively. While each of them brings their own strengths, they share common traits: a strong dedication to reproducible practices and a commitment to developing open-source software packages that enable others to apply their methods.
Creating software, let alone writing documentation, is an unconventional research output and far from being mainstream. Their dedication to this endeavour really underscores their forward-thinking mindset. Also, despite having no formal training in software development, they have shown remarkable adaptability and drive to create impactful tools. I’m looking forward to working with them and to what we will achieve together in the next couple of years!
You’re known as a strong advocate for open science and reproducible research. How do you promote these values within the AAGI project, especially when navigating collaboration with industry partners who may have concerns about openness or data sharing?
Firstly, I believe in leading by example. I can’t expect others to commit to open science and reproducibility if I don’t practice it myself. Secondly, it’s important to set this as the standard. While committing to open science and reproducibility requires significant effort often without immediate rewards, I have enough intrinsic motivation to pursue these practices because, in my opinion, it’s simply the scientifically responsible thing to do.
While most people I know agree with this in principle, putting it into practice is less common than it should be. So, I think some individuals initially require extrinsic motivation. I strive to hold those around me to a higher standard, and once they develop these habits, Ibelieve momentum will naturally sustain them.
I understand that not everything can be made open, especially when dealing with sensitive, commercial, or confidential data. It’s crucial to be reasonable and respect the concerns of data custodians, while still promoting open science for societal benefit as much as possible. When data sharing presents concerns, there are ways to address them, such as de-identifying data, creating synthetic data that emulate the real data’s statistical properties, or implementing an embargo period for data release. In collaborations, it’s important to first listen to concerns and then engage in constructive discussions on how to address these specific issues without undermining the advantages of open science and reproducible research.
I understand that not everything can be made open, especially when dealing with sensitive, commercial, or confidential data. It’s crucial to be reasonable and respect the concerns of data custodians, while still promoting open science for societal benefit as much as possible.
What strategies do you use to encourage industry partners to contribute to open-source tools and workflows, particularly when they are more familiar with commercial or proprietary software ecosystems?
I believe it’s crucial to understand the concerns of our industry partners and, where applicable, educate them about the benefits of open-source software. By making software open-source, we enhance transparency and create opportunities for the community to contribute to its improvement. This approach ultimately benefits not only the scientific community but also our industry partners. Additionally, data — such as the TIOBE index, which tracks programming language popularity — indicates that open-source languages are leading in terms of usage.
How do you keep learning?
I believe the best way to learn is through hands-on experience. In data analysis, I strive to adopt approaches that are more effective in practice, even if they take longer to implement initially. Of course, there are times when tight deadlines require a more pragmatic approach. However, when time permits, practice truly does lead to improvement. So, keep at it and don’t give up!
Keep In Touch
You can connect with Emi via LinkedIn, Github, and her personal website.
If you’d like to be part of a growing community of RSEs in Australia, become a member of RSE-AUNZ – it’s free!
Research Software News
Finalists Announced for 2025 Eureka Prize for Excellence in Research Software
Finalists for the 2025 Australian Museum Eureka Prize for Excellence in Research Software have been announced:
dartRmixOmicsnapari.
Sponsored and presented by the ARDC, the prize is awarded for the development, maintenance or extension of software that has enabled significant new scientific research. The winner will be announced at the 2025 Eureka Prizes Award Ceremony on Wednesday 3 September. Read more about the finalists and register to watch the ceremony.
The ARDC is proud to sponsor awards for research software and research software engineers in all stages of their careers. The goal of the awards is to strengthen the recognition of research software and those who develop and maintain it as being vital to modern research.
OceaniaR Hackathon
Keen on collaborating in-person on R-focused projects for social good, with some of the leading developers in the R community? The OceaniaR Hackathon will be held at ANU, Canberra on 23 November 2025, just before the 2025 Biometrics in the Bush Capital (BIBC) conference. Register now to join the OceaniaR Hackathon.
If you are keen on up-skilling in your R or communication skills, check out some of the pre-BIBC workshops, including generalised non-linear models, complex survey analysis, deep learning and computer vision, and statistical consulting.