Review guide for analysis best practice, developed at rOpenSci unconf 2017
Related: checkers - a package to assess analysis
While every analysis is different, there are common elements which can strengthen validity, reproducibility, and reusability. These guidelines describe, prioritize and assist analysts in developing the strongest analyses and workflows possible while remaining flexible for a wide variety of applications and contexts.
Tiers are in descending order of importance: Focus on Tier 1 elements, then Tier 2, then Tier 3.
Clear research question(s) which can be answered by your available data.
This might include “What patterns do we see?” and other exploratory analyses as well as more formal hypotheses, but questions and analysis plans should be clearly defined.
In order to conduct robust analyses and make reliable inferences, we have to understand our data.
dplyr and tidyr for tidying data; visdat for visualizing missingness
Check data for reasonable values.
1,239
)To retain as much information and avoid as much duplication of effort and human error as possible, data should be loaded in the form closest to its original state as is reasonable.
Do the column names and dimensions in your current data files match an expected set of names/dimensions? If rows or column are added or deleted, your results and the stability of your scripts and models might be affected.
dim()
of all datasetsAt minimum, the source of the data should be noted, along with any necessary acknowledgements. If the data is publicly available and licensed, the license should be included.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.