FAIR for Jupyter Notebooks: A Practical Guide

Learn why and how you should make your Jupyter Notebooks findable, accessible, interoperable and reusable (FAIR).

Jupyter Notebooks have been widely used in research and data science [1]. But despite this, there are several challenges in making them findable, accessible, interoperable and reusable (FAIR) to the benefit of both the creators and users.

A plausible approach is to apply the FAIR Principles to ensure the appropriate use of Jupyter Notebooks. However, the FAIR Principles are high-level aspirations and can be difficult to apply in practice. Also, translating the FAIR principles into practice will vary for each discipline.

On this page, you’ll find practical recommendations on making Jupyter Notebooks FAIR based on the FAIR for Research Software (FAIR4RS) approach. 

What Are Jupyter Notebooks?

The Jupyter Notebook is an open-source, browser-based tool for creating virtual lab notebooks that document research workflows, code, data and visualisations. It is ideal for interactive data science and scientific computing across disciplines, supporting programming languages including Python, R, Julia and a few others [2].

Why Make Jupyter Notebooks FAIR?

There are several motivations behind making Jupyter Notebook FAIR:  

  • Researchers are citing their Jupyter Notebooks in their publications [2]. Applying FAIR, a well-known approach, to Jupyter Notebooks leads to better citation scores and greater research impact [3].
  • Journals including Sustainability and many more are asking for open code and data.
  • As a movement to make scientific research (including software) and its dissemination accessible to all [4], open science is gaining momentum, and a plausible approach to this is FAIR.

How to Make Jupyter Notebooks FAIR

As digital objects or files, Jupyter Notebooks can be treated as research software, to which FAIR has long been applied.  It’s sensible to build a framework for FAIR Jupyter Notebooks based on prior work done for FAIR for research software [5] [1] [6]

Based on Barker et al.’s approach to FAIR for research software [5], here’s what a Jupyter Notebook looks like when it is findable, accessible, interoperable and reusable:

The Jupyter Notebook and its associated metadata are easy for both humans and machines to find. Specifically:

  • the Jupyter Notebook is assigned a globally unique and persistent identifier to – DOIs are suggested (F1)
  • different versions of the Jupyter Notebook are given distinct identifiers – it is recommended that researchers assign DOIs to publishable research and ensure version tracking is in place [1] (F1.1)
  • the Jupyter Notebook is described with rich metadata (F2)
  • the metadata are searchable and indexable (F3).

The Jupyter Notebook and its metadata are retrievable via standardised protocols. Specifically:

  • the protocol is open, free and universally implementable (A1.1)
  • the protocol allows for an authentication and authorization procedure, where necessary (A1.2)
  • the metadata is accessible, even when the software is no longer available (A2).

Git versioning is suggested as it’s widely used in software, research and data science.

The Jupyter Notebook interoperates with other Jupyter Notebooks or software by exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs).

Ideally, standard function calls are used with readable comments and open data formats. (I1)

The Jupyter Notebook is both executable and reusable (i.e. can be understood, modified, built upon or incorporated into other Jupyter Notebook source code). Specifically, the Jupyter Notebook is:

  • described with a plurality of accurate and relevant attributes (R1)
  • given a clear and accessible licence (R1.1)
  • associated with detailed provenance (R1.2)
  • including qualified references to other software and source code (R2)
  • in line with domain-relevant community standards (R3).

To achieve some of the above components and subcomponents of a FAIR Jupyter Notebook, consider the following recommendations:

Instead of storing the Jupyter Notebook on a local computer, use open-source version control on a publicly hosted repository.

Suggested repositories include GitHub.com, BitBucket.org  and GitLab.com [6].

Relevant FAIR (sub)components: accessible, specifically A1.1 and A1.2

Include keywords and purpose of your research in metadata and make metadata searchable and indexable. 

Use the citation.cff file,  which automatically collects metadata and lets others know how to cite your work.

Relevant FAIR (sub)components: F2, F3, A2

Before making the repository publicly available, set a licence. GitHub provides an easy-to-follow guide to licensing a repository [7].

Note that you should refer to your institutional guidelines on licensing. Having a licence protects the reusability and sharing capabilities of your software or code. 

Make sure to update the licence year and owner. Common licences include the Apache License 2.0, MIT License and GNU General Public License v3.0, the last of which could look like this: 

GNU GENERAL PUBLIC LICENSE

Version 3, 29 June 2007

Copyright © 2007 Free Software Foundation, Inc. https://fsf.org/ 

Everyone is permitted to copy and distribute verbatim copies of this licence document, but changing it is not allowed.

[…]

[Name], [Year]

Relevant FAIR (sub)components: reusable, specifically R1.1

It’s often the case that the Jupyter Notebook requires additional libraries, a version of a language or a package to run properly. A thorough guide to adding those dependencies is provided in the Binderhub documentation [8].

Providing the code dependencies in an appropriate manner – say by using Binderhub service or by creating a container or using a virtual environment – allows the intended audience to reproduce the environment accurately.  

You can create a Binderhub badge for your repository [9]. A Google Colab badge can also be created , though the dependencies are not automatically built in [10].

Relevant FAIR (sub)components: reusable

Provide references wherever applicable and mention the origins of the code.

Assign a persistent identifier (DOI) to the repository once the research is completed. GitHub has a thorough guide to generating DOIs for repositories [11].

Note that a new DOI is automatically generated on a new release. 

Relevant FAIR (sub)components: F1, F1.1

Further Resources

View an example of a FAIR Jupyter Notebook (DOI: 10.5281/zenodo.7690164) on Github. 

If you wish to create your own FAIR Jupyter Notebook, you can follow this workflow file on Github.

References

  1. Mendez, K. M., Pritchard, L., Reinke, S. N., & Broadhurst, D. I. (2019). Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing. Metabolomics, 15(10), 125. https://doi.org/10.1007/s11306-019-1588-0 Jump back
  2. Pimentel, J. F., Murta, L., Braganholo, V., & Freire, J. (2019). A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 507–517. https://doi.org/10.1109/MSR.2019.00077 Jump back
  3. Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), Article 1. https://doi.org/10.1038/sdata.2016.18 Jump back
  4. Open science. (2023). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Open_science&oldid=1148655421 Jump back
  5. Barker, M., Chue Hong, N. P., Katz, D. S., Lamprecht, A.-L., Martinez-Ortiz, C., Psomopoulos, F., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., & Honeyman, T. (2022). Introducing the FAIR Principles for research software. Scientific Data, 9(1), 622. https://doi.org/10.1038/s41597-022-01710-x Jump back
  6. Netherlands eScience Center and DANS. (n.d.). FAIR Research Software. FAIR Research Software. Retrieved February 28, 2023, from https://fair-software.nl/recommendations/repository Jump back
  7. Adding a license to a repository. (n.d.). GitHub Docs. Retrieved March 1, 2023, from https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository Jump back
  8. Choose languages for your environment—Binder 0.1b documentation. (n.d.). Retrieved March 1, 2023, from https://mybinder.readthedocs.io/en/latest/howto/languages.html Jump back
  9. Get started with Binder—Binder 0.1b documentation. (n.d.). Retrieved March 2, 2023, from https://mybinder.readthedocs.io/en/latest/introduction.html Jump back
  10. colab-github-demo.ipynb—Colaboratory. (n.d.). Retrieved March 2, 2023, from https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb#scrollTo=-pVhOfzLx9us Jump back
  11. Referencing and citing content. (n.d.). GitHub Docs. Retrieved March 1, 2023, from https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content Jump back