Review guide for analysis best practice, developed at rOpenSci unconf 2017

Related: checkers - a package to assess analysis

Rationale

While every analysis is different, there are common elements which can strengthen validity, reproducibility, and reusability. These guidelines describe, prioritize and assist analysts in developing the strongest analyses and workflows possible while remaining flexible for a wide variety of applications and contexts.

Terminology

Tiers

Tiers are in descending order of importance: Focus on Tier 1 elements, then Tier 2, then Tier 3.

Automation

Prerequisite

Clear research question(s) which can be answered by your available data.

This might include “What patterns do we see?” and other exploratory analyses as well as more formal hypotheses, but questions and analysis plans should be clearly defined.

Data

Know your Data/Data Structure (T1, Human)

In order to conduct robust analyses and make reliable inferences, we have to understand our data.

Examples

Package Suggestions

dplyr and tidyr for tidying data; visdat for visualizing missingness

Quality Checks (T1, Human)

Check data for reasonable values.

Examples of unreasonable values:

Package Suggestions

assertr

Rawest data form (T1, Human)

To retain as much information and avoid as much duplication of effort and human error as possible, data should be loaded in the form closest to its original state as is reasonable.

Examples

Check and control for updated source data (T2, Semi-Auto)

Do the column names and dimensions in your current data files match an expected set of names/dimensions? If rows or column are added or deleted, your results and the stability of your scripts and models might be affected.

Examples

Ownership clearly identified (T3, Human)

At minimum, the source of the data should be noted, along with any necessary acknowledgements. If the data is publicly available and licensed, the license should be included.

Scripts and Code

Fail fast if it fails (code needs to break fast to help improve) (T2, Human)

Examples

Robust to changes (T2, Semi-Auto)

Examples

Don't Repeat Yourself (T2, Semi-Auto)

Examples

Best Code Praxtices (T2, auto)

Examples

Platform agnostic (T2, auto)

Examples

Labelled chunks (T2, Auto)

Examples

Library (T3, Auto)

Examples

Streamlining objects in memory (T3, Auto)

Examples

License and ownership (T3, Auto)

Examples

Packages / Organisation

Neglected Packages (T1, Auto)

Examples

Readability (T1, Auto)

Examples

Expressive Object names (T2, Human)

Examples

Filestructure (T2, Semi-Automatic)

Examples

Long Blocks of old/commented out code (T2, Auto)

Examples

Version control (T2, Auto)

Examples

README for complex data structures (T2, Auto)

Examples

Package versions (Know your options) (T2, Auto)

Examples

Readability (Specifically, naming conventions) (T2, Auto)

Examples

Readability (Style: Adequate whitespace / indentation) (T3, Auto)

Examples

Analysis Tasks

Description of the root datasets (T1, human)

Examples

Description of all analyses (T1, Human)

Examples

Model Assumption Checks (T1, Semi-Auto)

Examples

Model Goodness of fit (T1, Semi-Auto)

Examples

Model Validation (T1, Semi-Auto)

Examples

Diagnostics (T1, Semi-Auto)

Examples

Plan to handle missingness (T1, Semi-Auto)

Examples

Model Results (T1, Semi-Auto)

Examples

Record of decisions (T2, Human)

Examples

Visualisation / Reporting

Informative Titles and Labels (T1, Human)

Examples

Final Product Geared for its intended audience (T1, Human)

Examples

Visualisation of conclusions (T1, Human)

Examples

Descriptive plots (T1, Semi-Auto)

Examples

Exploratory Data Analysis (T1, Semi-Auto)

Examples

Appropriately Conveyed Data Types (T1, Semi-Auto)

Examples

Clear Structure and Flow (T1, Automatic)

Examples

Reusable outputs (T2, Human)

Examples

Remove chartjunk (T3, Human)

Examples

Grammar and Spelling (T3, Auto)

Examples



ropenscilabs/checkers documentation built on May 20, 2022, 10:58 a.m.