munge: General-purpose data munging

Description Usage Arguments Value Examples

Description

One can use munge to take a data.frame, apply a given set of transformations, and persistently store the operations on the data.frame, ready to run on a future data.frame.

Usage

1
munge(dataframe, ..., stagerunner = FALSE, train_only = FALSE)

Arguments

dataframe

a data set to operate on.

...

usually a list specifying the necessary operations (see examples).

stagerunner

logical or list. Whether to run the munge procedure or return the parametrizing stageRunner object (see package stagerunner). If a list, one can specify remember = TRUE to pass to the stageRunner initializer.

train_only

logical. Whether or not to leave the trained parameter on each mungebit to TRUE or FALSE accordingly. For example, if stagerunner = TRUE and we are planning to re-use the stagerunner for prediction, it makes sense to leave the mungebits untrained. (Note that this will prevent one from being able to run the predict functions!)

Value

data.frame that has had the specified operations applied to it, along with an additional property mungepieces that records the history of applied functions. These can be used to reproduce the transformations on e.g., a dataset that needs to have a prediction run.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 
iris2 <- munge(iris,
  list(column_transformation(function(x) 2 * x), 'Sepal.Length'))
stopifnot(iris2[['Sepal.Length']] == iris[['Sepal.Length']] * 2)

iris2 <- munge(iris,
   # train function & predict function
   list(c(column_transformation(function(x) 2 * x),
        column_transformation(function(x) 3 * x)),
   # arguments to pass to transformation, i.e. column names in this case
   'Sepal.Length'))
stopifnot(iris2[['Sepal.Length']] == iris[['Sepal.Length']] * 2)
iris3 <- munge(iris, attr(iris2, 'mungepieces'))
# used transformations ("mungepieces") stored on iris2 and apply to iris3.
# They will remember that they've been trained already and run the
# prediction routine instead of the training routine. Note the above is
# also equivalent to the shortcut: munge(iris, iris2)
stopifnot(iris3[['Sepal.Length']] == iris[['Sepal.Length']] * 3)

## End(Not run)

robertzk/mungebits documentation built on May 27, 2019, 10:35 a.m.