README.md

Overview

skelpear package belongs to the pearsonverse - set of packages which facilitates the data science process in R. The main goal of this package is to support teams via building an identical project environment and maintaining a reproducibility. It depends mainly on ProjectTemplate package.

Installation

First install pearsonverse package. It will install all *pear packages.

devtools::install_github("pearsonplc/pearsonverse")

However, if you want to install just skelpear package:

devtools::install_github("pearsonplc/skelpear")

Main functions

  1. project_create()
  2. snapshot_pkg() & compare_snapshot()
  3. docker_snapshot()

Goals

1. Building project environment

1.1 project_create()

A function which builds the project skeleton. It will automatically open a new sesion. It contains several pre-defined directories and files. More info in Project structure section. For example,

project_create(name = "example_project", path = ".")

The function automatically initialises git environment. Then, to push your project into bitbucket, two things have to be done:

  1. Create a repo on bitbucket.
  2. Use git remote add origin <remote_URL> to link your local project with repo on bitbucket. E.g. git remote add origin https://lint_to_your_repo.git.

After that, you're ready to push your commit/s.

2. Maintaining reproducibility

2.1 snapshot_pkg() & compare_snapshot()

A pair of functions which allows to save and compare set of packages used during the project. It's especially useful when more team members are involved in code development.

The snapshot_pkg() function saves the package environment in config/packages.dcf file. It looks thorugh all R scripts and config/global.dcf within a project to find an execution of library, require or ::. Once you push it to the bitbucket repository, anybody can pull it and compare to the local package envrionment via compare_snapshot function.

Both functions return a message only when snapshot process fails. Below the local environment is identical with the snapshot.

How to read compare_snapshot summary

The summary consists of three sections:

Packages to install section lists packages which are not installed locally but are critical to the project.

Packages to reinstall section lists packages which have different version in local environment than in config/pacakges.dcf. In following example, dplyr v0.7.4 was detected in your local environment, but one of your colleagues uses dplyr v0.7.2. You should decide together which dplyr version you want to use.

Packages to save section lists packages which were detected in your local environment but have not been included in config/packages.dcf yet. You can include them by executing snapshot_pkg() function. But be careful, first you have to solve all conflicts on the first two sections.

2.2 docker_snapshot()

A function which creates chunk of code with package installation for Dockerfile. It stores those commands in memory. Just paste it (Cmd + V) in Dockerfile.

Project structure

Once you use project_name() function, you will get a new project directory with several pre-defined empty directories and files. Below you can find short description of each component:

->cache/ - a directory with cached data objects. ->config/ - a directory with one file global.dcf. It is responsible for all things which happen while opening the .Rproj file. ->data/ - a directory with data files (.csv, .RData etc.). ->graphs/ - a directory with created graphs during the project. ->misc/ - a directory with the rest of relevant files. ->munge/ - a directory with R scripts with all pre-processing R scripts which are executed while opening the .Rproj file. ->reports/ - a directory with presentations and shiny app. ->sql/ - a directory with sql queries. ->src/ - a directory with R scripts related to data analysis. ->.Rprofile ->project.Rproj



pearsonplc/skelpear documentation built on May 30, 2019, 3:45 p.m.