FAIR Containers for Research Software
Containers are a way to easily distribute software across different computational environments. Learn how to make containers findable, accessible, reusable and interoperable (FAIR) to maximise the benefits of your research software.
- Research software developers and engineers
- Higher-degree researchers (HDRs) / PhD candidates
- Early-/mid- career researchers (EMCRs)
- Senior researchers
- Infrastructure providers (including research facilities)
- Digital skills trainers
For a beginner’s overview of containers and how they can help researchers, read and watch our introduction to containers.
By the end of reading this resource, you should:
- understand how containers help make research software findable, accessible, reusable and interoperable (FAIR)
- know how to make containers themselves FAIR to that end.
Containers are a way to easily distribute software across different computational environments. Containers package an application with its environment so that it can run on any computing service from a laptop to the public cloud.
Containers are widely used due to their portability and reusability. Australian research platforms like BioCommons, AURIN and Neurodesk all use containers to provide easily accessible, scalable services.
The benefits of research software can be maximised when it is made available in a way that encourages collaboration and promotes reproducibility, transparency and integrity. Containers are an excellent method for achieving this because they make it easy and efficient to redistribute software.
In this guide, we’ll explore how containers help make research software FAIR and how to make containers themselves FAIR. For a beginner’s overview of containers and how they can help researchers, read and watch our introduction to containers.
Containers and FAIR Research Software
“Findable, Accessible, Interoperable and Reusable” – the FAIR Principles are a framework for increasing the transparency, reproducibility, and reusability of research. Originally for research data, they are applicable to other research outputs, including research software. Published in 2022, the FAIR Principles for Research Software (FAIR4RS) outline specifically how to make research software FAIR.
In short, research software should:
- have its source code stored in a publicly accessible code repository with version control
- have an appropriate licence for reuse
- be discoverable in a community software registry
- have a unique identifier (like a DOI) so it can be uniquely identified, accessed and cited.
Ideally the source code repository should also expose rich metadata which describes the application and makes it machine discoverable. Software metadata schemas include CodeMeta, which extends Schema.org software types.
One way to make source code findable and accessible is to create a repository on GitHub and include a CITATION.cff file with the code. Then by using the Zenodo-GitHub integration, each release of the software will be published to Zenodo with the correct metadata and a unique persistent identifier. The archiving of source code can be automated by integrating Zenodo with Software Heritage.
On top of these steps, you can containerise software to make it FAIR, as containers make it easy and efficient to redistribute software.
For example, a researcher finds a reference in a seminal paper which refers to a piece of software that was used to analyse the data. However, the research was carried out several years ago: the software either no longer works or produces unexpected results due to newer versions of underlying software libraries. By containerising the software along with its dependencies, it can be run reliably on different platforms over a much greater span of time.
Software which is to be containerised should also include all necessary configuration files in the archived release of source code (e.g. the Dockerfile).
Making Containers FAIR
For containers to be useful in making software FAIR, they need to be FAIR themselves. Before we explore how to make containers FAIR, it is helpful to clarify some key concepts of containers.
Source code and its dependencies are packaged into containers from a template or a container image. The container image becomes a container instance at runtime when it is executed by a container engine. During the container build process, a configuration file is used to include all the necessary dependencies to ensure the application will run correctly.
In Docker for example, a Dockerfile contains build instructions to package the application with the relevant dependencies into a container image. A container image allows someone to deploy and run the application without rebuilding it from source code.
Apptainer, a high-performance computing (HPC) container engine compatible with Docker images, uses a similar build configuration file (also known as a ‘recipe’) to build container images to run in high performance environments.
Container findability and accessibility
One way to make container images findable and accessible is to share them in a trusted and publicly accessible container registry.
A container registry is a persistent store for container images so they can be deployed on a target platform without needing to rebuild the container from source. Common registries include Docker Hub, Githubs’s Container Registry, Amazon Elastic Container Registry (ECR) and Azure Container registry.
In some registries the contents of the container image can be previewed and the user can scan the relevant FAIR information before they deploy the container. For example, Harbor, an open-source registry from the Linux Foundation and the Cloud Native Computing Foundation Cloud Native Computing Foundation (CNCF), has been implemented by the ARDC Nectar Research Cloud as the Australian Research Container Orchestration Services (ARCOS) Registry for use by Australian researchers.
Harbor scans container images for known vulnerabilities and, since v2.11, can generate a Software Bill of Materials (SBOM) as part of that process. A user can interrogate the SBOM for a container image and see the contents, including any FAIR-related files.
In future, it would be ideal if registries openly published container image metadata, including FAIR metadata, via an API so that it would be machine-readable and discoverable. A good example of suggested container metadata comes from the BioContainers open-source project.
Container interoperability
Interoperability of software is a complex problem, and containers present their own challenges. Since a container can run anywhere on any platform, it cannot guarantee what services will be available to it at runtime. A container may also run alongside many other containers in a complex workflow and need to communicate with them.
Containers interact with each other using standard mechanisms. Firstly, they can use a shared persistent volume that is common to both containers to share data; or they can use APIs and bridge networks for interprocess communication between containers.
When it comes to storing or exchanging application data, a good starting point is the principles of the Twelve-Factor App, in particular:
Following these rules ensures containers are portable and will be able to exchange information with each other in most environments.
For different systems and software to understand each other they must also speak the same language. They must use the standard protocols and formats to communicate or store data (e.g. SQL, JSON or XML) – this is known as syntactic interoperability. They must also adopt the conventions in meaning and terminology common to their discipline – this is known as semantic interoperability.
These standards should be documented alongside the code so potential users can understand how to interact with the containerised application and integrate it smoothly into different environments.
Container reusability
Containers are inherently reusable. A researcher should be able to take a containerised workload and easily run it on whatever platform they desire without having to worry about compatibility issues.
But the Reusable principle of FAIR4RS, however, extends much further than just being able to run the software. It also requires that software “can be understood, modified, built upon, or incorporated into other software”.
By linking the application in the container back to its source, researchers can make it more reusable. One way is to include FAIR information from the application source code in the container. Typically this would include the CITATION.cff file (as above), the licence and a README that documents how to build and reuse the software.
This helps to understand the origins of software in a container image. It helps with reproducibility by linking it to a particular version of source code, and it preserves the licence information under which the software has been distributed. You can also improve provenance by including qualified references to any other software being used. Follow this step-by-step workflow for adding a simple container in GitHub to a container image in a container registry like DockerHub.