Data Versioning

What is data versioning?

New versions of a dataset are being created when an existing dataset is reprocessed, corrected or appended with additional data. Data versioning helps track changes associated with dynamic data — data that is not static over time.

Why is data versioning important?

Researchers are required to identify and cite the exact dataset used as a research input in order to support research reproducibility and trustworthiness. This requires good management of data and data revision, and becomes particularly challenging when the datasets that are being cited are under constant changes and revision.

For a background on the importance of data versioning and recent developments, read the ARDC article Data Versioning — an epic chapter of a long story. The concept and benefits are also well summarised in W3C’s Data on the Web Best Practices guide.

Data versioning standards

There is no agreed standard or recommendation among data communities as to how and when data should be versioned. Some data providers may not retain a history of changes to a dataset, opting to make only the most recent version available. Other data providers have documented data versioning policies or guidelines based on their own discipline’s practice, which may not be applicable to other disciplines.

There is currently global work taking place on best practice for data versioning across data communities. The Research Data Alliance Data Versioning working group has come up with the following guidelines for data versioning:

Revision (version control)
Release (data products)
Granularity (aggregates, composites, collections and time series)
Manifestation (data formats and encodings)
Provenance (derived products).

Data versioning tools

There is no one-size-fits-all solution for data versioning and tracking changes. Data comes in different forms and is managed by different tools and methods. In principle, data managers should take advantage of data management tools that support versioning and track changes.

Example approaches include:

Git (and Github) for Data (with size <10Mbit or 100,000 rows), which offers effective distributed collaboration, provenance tracking, and the ability to share updates and synchronise datasets in a simple, effective way
Data versioning at ArcGIS, where users can create a geodatabase version, derived from an existing version.

Citing versioned data

There is no universal way to cite versioned data. The form of citation statement will depend on factors including publisher instructions, research domain and type of data. Citations to revisable datasets are likely to include version numbers or access dates.

DataCite recommends the following format citing versioned data: Creator(s) (Publication Year): Title. Version. Publisher. Identifier.

Search all resources

Curated collections

Data Versioning

What is data versioning?

Why is data versioning important?

Data versioning standards

Data versioning tools

Citing versioned data

Did you find this resource useful?

You may also be interested in

Australian National Persistent Identifier (PID) Strategy 2024

Vocabulary Symposium 2023 Recordings

Good Data Practices

Resources for HASS and Indigenous Researchers

Last updated

Type

Categories

Research Topic

Related Articles

Data Versioning – An Epic Chapter of a Long Story

Related Resources

Citation and Identifiers

Data and Software Citation

Data Provenance

NEWSLETTER SIGNUP

Search all resources

Curated collections

Data Versioning

What is data versioning?

Why is data versioning important?

Data versioning standards

Data versioning tools

Citing versioned data

Did you find this resource useful?

You may also be interested in

Australian National Persistent Identifier (PID) Strategy 2024

Vocabulary Symposium 2023 Recordings

Good Data Practices

Resources for HASS and Indigenous Researchers

Last updated

Type

Categories

Research Topic

Related Articles

Data Versioning – An Epic Chapter of a Long Story

Related Resources

Citation and Identifiers

Data and Software Citation

Data Provenance

Share & Print

NEWSLETTER SIGNUP