Home

/

GitHub

/

README.md
In keynmol/superdrive: Help your brain create reproducible environments for your research.

pinky

You're the Brain of the operation. You know what's going on and you know what to do. You want to do it quick. Ever needed to put your application on a bigger machine that has none of your dependencies? It's a pain in the ass if you didn't think about it before writing that huge pile of convoluted functions that definitely couldn't exist in any other way.

Well, pinky is here to help. It aims to introduce conventions and helper scaffolding functions to make your R applications more scalable, self-contained and ready for the reproducible data science (r)evolution we've been dreaming about since we were kids.

Pre-requisities: you should have R installed and running(duh), packrat and devtools installed into global or user library.

Installation: devtools::install_github('keynmol/pinky')

Getting started

TL;DR:

Create project: pinky::scaffold("/path/to/project/")
Put your dirty data in /path/to/project/data/
Put your dependencies in includes/libraries.R
Put your helper functions in /path/to/project/includes/functions.R
Put your data cleanup procedures in /path/to/project/includes/data.R
Generate binary clean data snapshot with tools/generate_data_snapshot.R
Put your experiments in root folder with source("includes/loader.R)` at the top.
Put your RMarkdown reports in /path/to/project/includes/reports and use source("includes/loader.R") there.
Run packrat::snapshot() each time you add a new dependency
Bundle application with packrat::bundle()
(optional) run experiments files using littler: r -L packrat experiment1.R
(optional) use inject and funfact to keep your data and functions immutable.
Be happy.

Starting a new project? Just run

pinky::scaffold("/path/to/project/")

And it will create a following folder structure:

/path/to/project
    ├── data
    ├── includes
    │   ├── data.R
    │   ├── functions.R
    │   ├── libraries.R
    │   └── loader.R
    ├── packrat
    └── tools
        └── generate_data_snapshot.R

It will also try and run packrat::init() for your new project folder, which will create the folder structure and files packrat loves so much.

An entry point and the holy grail of your self-contained reproducible environment. This is what it looks like:

## LOAD LIBRARIES
source("include/libraries.R")
## LOAD USER FUNCTIONS
source("include/functions.R")
## LOAD DATA
source("include/data.R")

Include this file at the top of your file with experiments and forget about missing dependencies, functions or data variables.

This is where packages imports go. After scaffolding it only contains pinky itself:

library(pinky)

Put all your library imports here you won't miss a library ever again.

This is where data gets loaded, cleaned and pre-processed. It's a long process, we know, that's why you should use optional loading([MORE ON THAT LATER WHEN I ACTUALLY IMPLEMENT IT]:

lol(didnt(implement))

Have all your data living in this file and your life will be slightly betterer.

This is the entry point to the modular structure of your application - use it to link to other files that contain functions and classes you wrote for your project.

Data loading can be quite long for big datasets, so a very simple script is provided:

## LOADER
source("includes/loader.R")
## LOAD USER FUNCTIONS
save.image("snapshot.Rdata")

The idea being that data and dependencies change quite infrequently, so it's a lot easier to keep raw data untouched and dirty and all the transformations it has to undergo in the data.R. To avoid running those transformations everytime(they can be quite costly), loader.R will try to avoid running data.R if there's a snapshot.Rdata present in the root folder. Neat.

It's a good idea to store data the way it came from the source(or as close to it as possible) and keep track of all the applied transformations. Using binary snapshots helps save time on running those tranformations every single time.

TODO

Actually learn about R environments and remove the "shout yourself in the balls" side effect of inject
Examples, vignettes, proper docs..
Better code organisation
Just be a better person in general.

keynmol/superdrive documentation built on May 20, 2019, 9:20 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

keynmol/superdrive
Help your brain create reproducible environments for your research.

README.md
In keynmol/superdrive: Help your brain create reproducible environments for your research.

pinky

Getting started

includes/loader.R

includes/libraries.R

includes/data.R

includes/functions.R

tools/generate_data_snapshot.R

TODO

R Package Documentation

Browse R Packages

We want your feedback!

keynmol/superdrive Help your brain create reproducible environments for your research.

README.md In keynmol/superdrive: Help your brain create reproducible environments for your research.

pinky

Getting started

includes/loader.R

includes/libraries.R

includes/data.R

includes/functions.R

tools/generate_data_snapshot.R

TODO

R Package Documentation

Browse R Packages

We want your feedback!

keynmol/superdrive
Help your brain create reproducible environments for your research.

README.md
In keynmol/superdrive: Help your brain create reproducible environments for your research.