Skip to article frontmatterSkip to article content

One neat way of sharing your research workflow - whether that is your data cleaning process, the statistical modelling that you have applied to your data set, or how you have visualised your results - is through Jupyter notebooks.

Working with sensitive data may, however, create barriers to sharing your Jupyter notebook: you do not want to commit a Jupyter notebook containing sensitive data to GitHub.

You can get around this hurdle by manually clearing your notebook’s output before each Git commit you do. This process is, however, time-consuming, cumbersome and - most importantly - extremely error-prone. You only need to forget to clear the output once to inadvertently expose your data. Another, much more efficient (and failsafe!) way of doing this is by using nbstripout.

nbstripout

nbstripoutis a utility that - when used as a git filter or pre-commit hook - automatically strips the Jupyter (or IPython/Zeppelin) notebook output before Git even gets the chance to see it. In other words, it simulates the Clear All Output procedure in the Jupyter notebook user interface.

Installing nbstripout

The latest version of nbstripout can be installed from PyPI (The Python Package Index) using the command pip install --upgrade nbstripout. If you are using Anaconda, you can install nbstripout via the conda package manager: conda install -c conda-forge nbstripout.

Setting up the git filter and attributes

Once nbstripout is installed, you need to add it to your local Git repository. Start by creating a new repository or navigating to one that you are already using. Once there, add nbstripout using the command nbstripout --install.

You can check that nbstripout has successfully been applied as a filter by running the command cat .git/config, or checking git attributes by running the commandcat .git/info/attributes.

Removing the git filter and attributes

If you decide that you would like to remove nbstripout, simply run nbstripout --uninstall whilst in the repository.

Installing nbstripout globally

nbstripout is generally installed in one local Git repository at a time, so that you can control when it is applied as a filter.

However, if all of your notebooks deal with sensitive data, it might be a good idea to install nbstripout globally across all of your Git repositories. This way, no notebooks risk slipping under the radar.

To install nbstripout globally, run the command: nbstripout --install --global