skelpear
package belongs to the pearsonverse
- set of packages which facilitates the data science process in R. The main goal of this package is to support teams via building an identical project environment and maintaining a reproducibility. It depends mainly on ProjectTemplate
package.
First install pearsonverse
package. It will install all *pear
packages.
devtools::install_github("pearsonplc/pearsonverse")
However, if you want to install just skelpear
package:
devtools::install_github("pearsonplc/skelpear")
project_create()
snapshot_pkg()
& compare_snapshot()
docker_snapshot()
A function which builds the project skeleton. It will automatically open a new sesion. It contains several pre-defined directories and files. More info in Project structure section. For example,
project_create(name = "example_project", path = ".")
The function automatically initialises git
environment. Then, to push your project into bitbucket, two things have to be done:
git remote add origin <remote_URL>
to link your local project with repo on bitbucket. E.g. git remote add origin https://lint_to_your_repo.git
.After that, you're ready to push your commit/s.
A pair of functions which allows to save and compare set of packages used during the project. It's especially useful when more team members are involved in code development.
The snapshot_pkg()
function saves the package environment in config/packages.dcf
file. It looks thorugh all R scripts and config/global.dcf
within a project to find an execution of library
, require
or ::
. Once you push it to the bitbucket repository, anybody can pull it and compare to the local package envrionment via compare_snapshot
function.
Both functions return a message only when snapshot process fails. Below the local environment is identical with the snapshot.
How to read compare_snapshot
summary
The summary consists of three sections:
Packages to install section lists packages which are not installed locally but are critical to the project.
Packages to reinstall section lists packages which have different version in local environment than in config/pacakges.dcf
. In following example, dplyr v0.7.4
was detected in your local environment, but one of your colleagues uses dplyr v0.7.2
. You should decide together which dplyr
version you want to use.
Packages to save section lists packages which were detected in your local environment but have not been included in config/packages.dcf
yet. You can include them by executing snapshot_pkg()
function. But be careful, first you have to solve all conflicts on the first two sections.
A function which creates chunk of code with package installation for Dockerfile. It stores those commands in memory. Just paste it (Cmd + V) in Dockerfile.
Once you use project_name()
function, you will get a new project directory with several pre-defined empty directories and files. Below you can find short description of each component:
->cache/ - a directory with cached data objects.
->config/ - a directory with one file global.dcf
. It is responsible for all things which happen while opening the .Rproj
file.
->data/ - a directory with data files (.csv
, .RData
etc.).
->graphs/ - a directory with created graphs during the project.
->misc/ - a directory with the rest of relevant files.
->munge/ - a directory with R scripts with all pre-processing R scripts which are executed while opening the .Rproj
file.
->reports/ - a directory with presentations and shiny app.
->sql/ - a directory with sql queries.
->src/ - a directory with R scripts related to data analysis.
->.Rprofile
->project.Rproj
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.