10
Jul

Australian Sensitive Data Interest Group Meeting 31: Synthetic Data in Genomics and Health

Learn about 2 recent CSIRO projects to generate synthetic data in genomics and healthcare.
A person handing a key to another

About the Event

At this meeting of the Australian Sensitive Data Interest Group (AUSDIG), we’ll have 2 presentations on 2 synthetic data projects at CSIRO.

Presentations and Speakers

Genomator: generating synthetic genomes

Synthetic data is a valuable resource. We introduce a novel technique to generate synthetic genome data using SAT solvers, the process of which is demonstrably efficient, accurate and has interesting privacy properties. We show how the power of SAT solvers can be harnessed to deductively generate such synthetic data, and how such deductive process can be reversed to provide a measure of absolute privacy quantification of the results. The resulting private synthetic genome data has potential industry applications.

This presentation will be given by Mark Alexander Burgess, a Research Engineer who has recently joined CSIRO’s Transformational Bioinformatics group. Mark has been applying constraint programming and particularly SAT techniques to the creation of synthetic data. He works in C and Python and explores unconventional programming languages.

Learn more about Genomator.

An approach for generating realistic Australian synthetic healthcare data

Healthcare data is a scarce resource, and access is often cumbersome. While medical software development would benefit from real datasets, the privacy of the patients is held at a higher priority.

Realistic synthetic healthcare data can fill this gap by providing a dataset for quality control while at the same time preserving the patient’s anonymity and privacy. Existing methods focus on American or European patient healthcare data, but none is exclusively focused on Australia, which has a highly diverse population and a unique healthcare system.

To overcome this problem, we used a popular publicly available tool, Synthea, to generate disease progressions based on the Australian population. With this approach, we were able to generate 100,000 patients following Queensland demographics.

This presentation will be given by Dr Ibrahima Diouf, a Research Scientist in the Health Intelligence team at the CSIRO Australian e-Health Research Centre. Ibrahima has extensive experience in the analytics of observational data. His main research interests include statistical methodologies for biomedical research, and he has experience in developing and applying causal inference methods.

Read more about this project.

Recording

This event will be recorded. The recording will be provided to all registrants and published here.

About AUSDIG

AUSDIG provides an opportunity for anyone interested in discussing the challenges and strategies for managing sensitive data. To watch previous meetings and join the mailing list, visit the AUSDIG website.

Do you have questions about this event? Email [email protected].