Australian Dataspaces: An Introduction and FAQs
Explore the concept of Australian dataspaces, designed to securely share and manage restricted access data while maintaining control over its use. Learn how these dataspaces foster collaboration and innovation while enhancing compliance, trust, and transparency in data sharing through various tools and principles.
- Early-/mid- career researchers (EMCRs)
- Senior researchers
- Infrastructure providers (including research facilities)
- Data custodians/managers
- Government
- Industry
By the end of reading this resource, you should:
- understand what dataspaces are and their benefits
- find answers to some of the most frequently asked questions about dataspaces – including who they are for, why they should be adopted, their components and mechanisms, and models and certification for them
- find out what the ARDC is doing on dataspaces in Australia and get involved.
November 2024
- Question added about policy implementation in dataspaces
Sharing data can accelerate research impact. However, significant data, especially industry data, that could be shared under controlled access are not being made available due to concerns over trust and loss of control over the data once shared. Dataspaces provide a blueprint to share restricted access data more widely by ensuring control over the data even after it has been shared.
Through the Australian Dataspaces activity of our Planet Research Data Commons for earth and environment science research, the ARDC is exploring the potential of applying the dataspaces concept to the Australian data landscape.
An Introduction to Dataspaces
A dataspace is digital infrastructure that enables participants to find, access and use data based on the governance framework of that dataspace. The vision for dataspaces is to create trusted, secure ecosystems for data exchange where all participants follow agreed rules. Data providers set the usage policies for their data and control over that data can persist even after it is shared.
Dataspaces provide a mechanism for data at all levels of sensitivity to be shared across multiple systems and sources. A dataspace is built on 4 pillars:
A technical connector component is used to participate in a dataspace and supports these 4 pillars. Connectors are often open source so that anyone can inspect the code or with the right skills create their own connector that they may choose to have certified.
The fundamentals of dataspaces have emerged over many decades but have recently been consolidated and matured by the International Data Spaces Association (IDSA).
The work of IDSA has been adopted in the most recent European Data Strategy with the European Union establishing the Data Spaces Support Centre (DSSC) to coordinate and drive adoption of the dataspaces approach.
The IDSA and Data Spaces Support Centre together provide a flexible set of policies, guidance, and protocols to support the implementation of dataspaces. Dataspaces also frequently include services to link cloud providers such as Gaia-X or other services provided by organisations like FIWIRE, the Eclipse Foundation, or Prometheus-X. A number of IDSA use cases highlight how dataspaces are being used, and there are over 150 dataspaces on the IDSA radar in various stages of development including europeana, the European Health Data Space and a new dataspace hub in Japan. Some of the most successful operational dataspaces include Catena-X, the Green Deal Dataspace, and the Mobility Data Space.
In its simplest form a ‘minimum viable dataspace’ only requires at least one data owner and one data consumer. However, the full benefit of dataspaces will result from the ‘network effect’ of adoption by as many participants as possible.
For more information:
- visit the Data Spaces Support Centre and read their 101 introductory material
- read about the Data Spaces Reference Architecture Model.
ARDC and Australian Dataspaces
Through the Australian Dataspaces activity of our Planet Research Data Commons for earth and environment science research, the ARDC is currently conducting exploratory work around establishing a dataspace capability in Australia. This initiative aims to investigate the feasibility and value of establishing dataspaces and support for dataspaces in Australia. Learn more and register your interest in our work.
The ARDC is also a member of the International Data Spaces Association (IDSA), contributing to development of the ISO standard, conducting trials and advocating aspects of the approach to ensure Australian researchers have a competitive advantage through data.
FAQs About Dataspaces
A dataspace is digital infrastructure that enables data transactions between participants, based on the governance framework of that dataspace. It is used for securely sharing restricted access data.
A dataspace is established by a community that has a need to come together around specific use cases that require data from a range of data owners that have specific requirements around data sovereignty, security and governance (i.e. the data owners cannot or do not wish to make all the data open). The standardisation of processes and the interoperability within a dataspace make this a cost-effective, efficient solution for data sharing across geographic boundaries or across sectors, such as industry, research institutions, government, health care providers, and NGOs.
A dataspace allows consumers and suppliers of data to extend their network of data connections beyond those operating within an existing network with bespoke solutions, whilst allowing each participant in the dataspace to self-determine how their data is accessed and used within a trusted environment. A dataspace ensures recognition of data sharing rules, enforcement of negotiated contracts and agreements, and retention of data sovereignty by the data owners. It provides a rules-based environment in which disparate data owners can participate in order to progress collaborative undertakings, with confidence that their data will not be compromised and will be used in accordance with their requirements.
The biggest reason to build a dataspace, ultimately, may be seamless and efficient data interoperability that can span multiple organisations and domains while maintaining sovereignty with data owners.
A key building block of each dataspace is the collaborative development of a business model where the different parties involved collaborate towards shared objectives while considering each dataspace participant’s incentives and business models.
Below is a summary of advantages of a dataspace.
- A dataspace provides data suppliers with the ability to maintain control over data after it is shared. This is done by providing mechanisms that ensure data consumers use and access data as required. Examples of these mechanisms include consumer compliance systems and communication of rule violation consequences.
- High levels of standardisation within dataspaces enable easy scaling to additional data owners and data consumers, fostering broader data economies and eliminating the need for costly, bespoke bilateral agreements.
- A dataspace provides seamless and secure interoperability among data suppliers, data consumers and intermediaries by creating an interconnected network of participants enabled through technologies and processes that can facilitate interoperability across different jurisdictions and cloud providers.
- A dataspace provides a comprehensive solution to building trust among participants through standardised policies and agreements that transparently establish rules and compliance mechanisms for all.
- A dataspace accelerates sharing of high-quality data assets that could be shared under appropriate conditions but are currently not.
A dataspace connector is the service that provides access to data in accordance with the rules defined in the governance framework of that dataspace. A connector serves as the interface through which data owners and data consumers exchange data securely by enforcing access and usage policies.
There are a variety of dataspace connectors, including a number of popular open source connectors which vary in complexity and levels of security provided. Some examples include the FIWARE TRUE Connector, the Prometheus-X Dataspace Connector, the Eclipse (EDC) Connector, and the IDS Connector. View an example of a connector UI.
A full-service connector provides:
Data connectors are secure and compartmentalised components, ensuring data flows safely between participants. While similar to quarantining, which blocks all traffic until threats are eliminated, compartmentalisation reduces risks to the entire system by limiting the impact of potential breaches. Compartmentalisation ensures controlled and monitored data exchanges can continue safely, isolating specific components to contain issues without requiring complete isolation from the entire network.
Typically, the clearinghouse provides reports on data transactions including the number of transactions, ensuring transparency by logging what data has been shared and under what conditions, including compliance with rules.
The data connectors relay necessary information about transactions to the clearinghouse to ensure adherence to governance policies and enable tracking for audits.
In many data sharing arrangements, data owners effectively lose control over their data once it leaves their infrastructure, relying on the data consumer’s internal processes to adhere to agreed terms. A dataspace mitigates this concern by ensuring that data owners retain full control over their data through automated enforcement mechanisms implemented by the dataspace connector and other associated components, such as an authentication service, a broker (catalogue), and a clearing house (Figure 2).
Additionally, before accessing any data, data consumers must become members of the dataspace and complete an onboarding process. This process ensures that they understand the correct data usage policies and the consequences of misuse. Once onboarded, data consumers are subject to automated monitoring and compliance verification systems that continuously track data usage to ensure adherence to the established rules. Further, potential audits can be conducted to further verify that data consumers are abiding by these conditions. These measures provide data owners with greater assurance that their data remains under their control throughout its lifecycle, even after it has been shared.
If a data consumer misuses data, the provider has several options, such as:
- revoking access
- deleting data from the consumer’s environment
- imposing fines or financial penalties
- taking legal action.
Misuse can also result in reputational damage, loss of certification and increased scrutiny through additional audits.
Establishing bespoke, bilateral data sharing agreements is usually a time-consuming, highly granular, and costly process. This can create significant barriers to effective data sharing.
Dataspaces address this issue by offering a standardised approach to data governance and handling. This standardisation significantly reduces the need for crafting custom agreements and technical solutions for each data exchange. By using a common framework for data sharing agreements and interoperable secure systems, dataspaces streamline the process, lowering the time, effort, and costs associated with negotiating and enforcing these agreements. Additionally, the transparent governance provided by dataspaces ensures that all participants adhere to the established rules, further reducing the need for complex and costly bespoke arrangements.
Policies in dataspaces are usually implemented in machine-readable formats like JSON or XML, using languages such as ODRL (Open Digital Rights Language). This standardisation allows connectors and data sharing systems to interpret and automatically enforce these policies. By encoding policies in a widely accepted language like ODRL, seamless interoperability is achieved across various platforms and participants. This ensures that they can exchange and implement rules uniformly, eliminating ambiguity and misinterpretation of policy definitions.
Policies in dataspaces address complex rules, such as restricting data use to specific purposes, users, or prohibiting redistribution. The IDSA framework recognizes that interpretation challenges may arise, especially between disciplines. To address this, dataspaces are designed with intra-space (domain-specific) and inter-space (cross-disciplinary) policy distinctions. This approach tailors guidelines to accommodate either discipline-specific or cross-sector data sharing requirements, enhancing interoperability and reducing inconsistencies.
For example, a JSON-encoded policy might specify conditions as follows:
{
"policy": {
"uid": "http://example.org/policy:12345",
"type": "Set",
"permissions": [
{
"target": "http://example.org/asset:abc",
"action": "use",
"constraints": [
{
"leftOperand": "dateTime",
"operator": "lt",
"rightOperand": "2024-12-31T23:59:59Z"
}
]
}
],
"prohibitions": [
{
"target": "http://example.org/asset:abc",
"action": "redistribute"
}
],
"obligations": [
{
"action": "delete",
"constraint": {
"leftOperand": "dateTime",
"operator": "gt",
"rightOperand": "2024-12-31T23:59:59Z"
}
}
]
}
}
In this example:
- permission allows the asset to be used until the specified date
- prohibition prevents redistributing the asset
- obligation requires the asset to be deleted after the specified date.
This code-based approach to policies in dataspaces ensures consistent interpretation and smooth compliance, supporting complex data sharing arrangements while maintaining clear, enforceable standards.
For more information, see these sources:
- policy patterns for usage control in data spaces and associated repository
- example policies
- IDS usage control policies
- IDSA Information Model, an RDFS/OWL-ontology
- Data Spaces Support Centre glossary, inclusive of policy terms.
Common standards are the key to interoperability. The more standardised the data, governance, semantics and infrastructure are, the easier a dataspace is to set up and manage.
Common standards include common data models, data formats, reference architectures, communication protocols etc. Further, by using shared vocabularies and ontologies, dataspaces can facilitate high levels of semantic interoperability.
Dataspaces provide a unified governance structure and policies as well as the technical infrastructure like APIs and middleware to facilitate exchange as agreed. Again, decentralised federated governance and related flexible modular infrastructure can facilitate interoperability with less standardised organisations, but growth in connectivity is associated with increased standardisation.
Dataspaces enable the creation of dynamic data ecosystems by fostering secure and decentralised data sharing between multiple stakeholders, including industries, governments, and research institutions. These ecosystems are built on distributed infrastructures, where data remains with the data owner and is shared only when needed, ensuring data sovereignty and trust. Dataspaces integrate participants across different domains through interoperable standards and common governance frameworks, facilitating smooth data exchange without the need for centralised control.
Trusted research environments (TREs) and dataspaces are complementary.
TREs are typically used to provide controlled environments where sensitive data, such as healthcare or government data, is aggregated, accessed and analysed by researchers under strict governance and security protocols.
Dataspaces provide the trusted and secure data infrastructure that can supply critical restricted access data to TREs, which would otherwise take years to negotiate transfer of or else would not be supplied at all.
Together, they can support secure data sharing in specific research projects while allowing for broader collaboration across industries and sectors.
The International Data Spaces Reference Architecture Model (IDS-RAM) is an abstract model for a generalised dataspace, representing the overall principles of operation, and layers of functionality any dataspace can be expected to encapsulate. It is specific to the International Data Spaces Association (IDSA) vision of dataspaces, but is general enough to be useful for understanding a dataspace of any variety.
The IDS-RAM looks at 3 perspectives on a dataspace that capture the overarching principles of how a dataspace operates:
The IDS-RAM covers 5 layers of a dataspace. These layers group the functions a dataspace provides:
Certifications provide evidence that a dataspace has met specific requirements as assessed by an independent accredited certification body. As a dataspace matures and grows, a variety of certifications are often sought by participating entities. These certifications may apply to all participants, or may be sought by individual participants. The International Data Spaces Association (IDSA) specifies certifications for both participants and components in a dataspace and at different levels. IDSA certifications tend to closely align with other international certifications such as:
- ISO 27001 for information security management (which is highly compatible and reusable with IDS certification)
- ISO 9001 for quality management systems
- ISO 27701 for privacy information management.
The numerous benefits of dataspace certifications include:
International Data Space (IDS) certifications provide a structured and progressive approach to signalling levels of trustworthiness in a maturing dataspace with different types and different levels of certifications.
Operational Environment Certification
Component Certification
All IDS components, including hardware, IDS connectors, metadata brokers, apps and services, and the app store can be certified at varying levels.
International Data Spaces Association (IDSA)
The International Data Space Association (IDSA) develops and promotes the International Data Spaces model for secure and sovereign data sharing, ensuring interoperability and trust among diverse stakeholders. It is a nonprofit organisation based in Germany that was founded in 2016. Its aim is to provide a secure, privacy-preserving, and trustworthy scheme for data exchange, known as the International Data Spaces (IDS).
Data Spaces Support Centre (DSSC)
The Data Spaces Support Centre (DSSC) supports the implementation and scaling of dataspaces, especially across Europe, by providing guidance, tools, and resources to ensure adherence to best practices and standards.
International Organization for Standardization (ISO)
The International Organization for Standardization (ISO) is an independent NGO that develops and publishes international standards across a wide range of industries, including data management, security, and interoperability. The ISO is developing a standard for dataspaces concepts and characteristics (ISO/IEC AWI 20151), to which the ARDC is contributing.
These standards provide guidelines and best practices to ensure that products, services, and systems are safe, reliable, and of good quality. Some of these standards can be certified, and some are important components of dataspaces, such as the ISO 27001 security standard.
International Electrotechnical Commission (IEC)
The International Electrotechnical Commission (IEC) is a global organisation that prepares and publishes international standards for electrical, electronic, and related technologies. These standards ensure the safety, efficiency, and interoperability of systems, including those used in dataspaces. For example, standards such as IEC 62443 are foundational in some dataspace certifications.
Cloud service providers (CSPs)
Cloud service providers or CSPs (e.g. Microsoft Azure, Google Cloud, and Amazon Web Services) offer the infrastructure and tools necessary for deploying and operating secure and scalable dataspaces, supporting data storage, processing, and security.
Gaia-X
Gaia-X is a European initiative that aims to create a federated data infrastructure for secure and interoperable data sharing across Europe, industries and different cloud service providers.
FIWARE Foundation
The FIWARE Foundation provides an open-source platform and tools for building interoperable and scalable digital services, including those involving dataspaces across various sectors.
Eclipse Foundation
The Eclipse Foundation governs the Eclipse Dataspace Components (EDC) Framework, a comprehensive framework providing a basic set of features (functional and non-functional) that dataspace implementations can reuse and customise by leveraging the framework’s defined APIs and ensure interoperability by design. It is powered by the specifications of the Gaia-X AISBL Trust Framework and the IDSA Dataspace protocol.
The EDC is designed for developers who want to build dataspace implementations on an existing, standards-based framework and to adopt and adapt it with their own solutions. Developers use the EDC to build data-sharing services for their customers.
Big Data Value Association
The Big Data Value Association is an industry-driven research and innovation organisation that supports the European big data economy by promoting data-driven innovation and coordinating efforts in establishing data sharing frameworks and infrastructures.
MyData Global
MyData Global is a non-profit organisation that advocates for ethical data use and personal data sovereignty, developing frameworks and standards for managing personal data within dataspaces.
Register Your Interest in Australian Dataspaces
We’re establishing dataspaces in Australia to create trusted, secure ecosystems for secure data exchange between research, industry and government in Australia. Stay up to date with the project by registering your interest via the form below.