Be Reproducible {#reproducible}

(ref:reproducible-intro)

You Must - (ref:reproducible-must)

You Should - (ref:reproducible-should)

You Could - (ref:reproducible-could)

|Related Areas: | Demonstrably Correct
Documentation | |--------------- |------------------------------------------------------------|

Unambiguous Documentation for Reproducibility {#unambiguous_docs}

To be able to reproduce your analysis a colleague may need the following:

At the most basic level, documenting all of these will go a long way to making your analysis reproducible. It might not make it easy to reproduce however.

Portability {#portability}

There are some simple thing you can do to improve the chance that your code runs on other computers:

Project Structure {#projects}

Most languages offer tools and templates for a project based workflow. Typically these include a way of organising the following components:

By following a standard template for these components you can take advantage of workflow tools provided by your IDE which make it easier to:

All of these things are good for sharing or collaborating with others.

See R at DHSC and Python at DHSC for more information.

Reproducible Analytical Pipelines {#rap_section}

There is a government community dedicated to the production of reproducible analysis. See Reproducible Analytical Pipelines for more.

Packages and Modules {#packages}

Most languages have a standard structure which is used to share code and documentation with other people. You will likely have used code in this structure (libraries / packages / modules) when performing your analysis. Typically these structures include documentation, information about dependencies, and tests.

There is no reason you can't use the same approach to sharing your analysis!

See R at DHSC and Python at DHSC for more information.

Containers / Docker {#containers}

Containers allow you to manage the whole environment which a bit of code runs in. They are powerful but perhaps more technically involved than packaging your code or using project structures to manage your environment.

Docker is a containerisation platform, which lets you reproduce environments with a wider scope than just the packages present. With Docker you can manage the entire environment from the operating system and network up (including any packages).

You can use tools such as docker-compose and Kubernetes to manage groups of containers relative to one another.



DataS-DHSC/coding_principles_book documentation built on March 11, 2020, 4:13 a.m.