You're the Brain of the operation. You know what's going on and you know what to do. You want to do it quick. Ever needed to put your application on a bigger machine that has none of your dependencies? It's a pain in the ass if you didn't think about it before writing that huge pile of convoluted functions that definitely couldn't exist in any other way.
Well, pinky
is here to help. It aims to introduce conventions and helper scaffolding functions to make your R applications more scalable, self-contained and ready for the reproducible data science (r)evolution we've been dreaming about since we were kids.
Pre-requisities: you should have R installed and running(duh), packrat and devtools installed into global or user library.
Installation: devtools::install_github('keynmol/pinky')
TL;DR:
pinky::scaffold("/path/to/project/")
/path/to/project/data/
includes/libraries.R
/path/to/project/includes/functions.R
/path/to/project/includes/data.R
tools/generate_data_snapshot.R
source("includes/loader.R
)` at the top./path/to/project/includes/reports
and use source("includes/loader.R")
there.packrat::snapshot()
each time you add a new dependencypackrat::bundle()
littler
: r -L packrat experiment1.R
inject
and funfact
to keep your data and functions immutable.Starting a new project? Just run
pinky::scaffold("/path/to/project/")
And it will create a following folder structure:
/path/to/project
├── data
├── includes
│ ├── data.R
│ ├── functions.R
│ ├── libraries.R
│ └── loader.R
├── packrat
└── tools
└── generate_data_snapshot.R
It will also try and run packrat::init()
for your new project folder, which will create the folder structure and files packrat loves so much.
An entry point and the holy grail of your self-contained reproducible environment. This is what it looks like:
## LOAD LIBRARIES
source("include/libraries.R")
## LOAD USER FUNCTIONS
source("include/functions.R")
## LOAD DATA
source("include/data.R")
Include this file at the top of your file with experiments and forget about missing dependencies, functions or data variables.
This is where packages imports go. After scaffolding it only contains pinky
itself:
library(pinky)
Put all your library imports here you won't miss a library ever again.
This is where data gets loaded, cleaned and pre-processed. It's a long process, we know, that's why you should use optional loading([MORE ON THAT LATER WHEN I ACTUALLY IMPLEMENT IT]:
lol(didnt(implement))
Have all your data living in this file and your life will be slightly betterer.
This is the entry point to the modular structure of your application - use it to link to other files that contain functions and classes you wrote for your project.
Data loading can be quite long for big datasets, so a very simple script is provided:
## LOADER
source("includes/loader.R")
## LOAD USER FUNCTIONS
save.image("snapshot.Rdata")
The idea being that data and dependencies change quite infrequently, so it's a lot easier to keep raw data untouched and dirty and all the transformations it has to undergo in the data.R
. To avoid running those transformations everytime(they can be quite costly), loader.R
will try to avoid running data.R
if there's a snapshot.Rdata
present in the root folder. Neat.
It's a good idea to store data the way it came from the source(or as close to it as possible) and keep track of all the applied transformations. Using binary snapshots helps save time on running those tranformations every single time.
inject
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.