mungebits)

Description Usage Arguments Value Examples

One can use munge to take a data.frame, apply a given set of transformations, and persistently store the operations on the data.frame, ready to run on a future data.frame.

1	munge(dataframe, ..., stagerunner = FALSE, train_only = FALSE)

`dataframe`	a data set to operate on.
`...`	usually a list specifying the necessary operations (see examples).
`stagerunner`	logical or list. Whether to run the munge procedure or return the parametrizing stageRunner object (see package stagerunner). If a list, one can specify `remember = TRUE` to pass to the stageRunner initializer.
`train_only`	logical. Whether or not to leave the `trained` parameter on each mungebit to `TRUE` or `FALSE` accordingly. For example, if `stagerunner = TRUE` and we are planning to re-use the stagerunner for prediction, it makes sense to leave the mungebits untrained. (Note that this will prevent one from being able to run the predict functions!)

data.frame that has had the specified operations applied to it, along with an additional property mungepieces that records the history of applied functions. These can be used to reproduce the transformations on e.g., a dataset that needs to have a prediction run.

## Not run: 
iris2 <- munge(iris,
  list(column_transformation(function(x) 2 * x), 'Sepal.Length'))
stopifnot(iris2[['Sepal.Length']] == iris[['Sepal.Length']] * 2)

iris2 <- munge(iris,
   # train function & predict function
   list(c(column_transformation(function(x) 2 * x),
        column_transformation(function(x) 3 * x)),
   # arguments to pass to transformation, i.e. column names in this case
   'Sepal.Length'))
stopifnot(iris2[['Sepal.Length']] == iris[['Sepal.Length']] * 2)
iris3 <- munge(iris, attr(iris2, 'mungepieces'))
# used transformations ("mungepieces") stored on iris2 and apply to iris3.
# They will remember that they've been trained already and run the
# prediction routine instead of the training routine. Note the above is
# also equivalent to the shortcut: munge(iris, iris2)
stopifnot(iris3[['Sepal.Length']] == iris[['Sepal.Length']] * 3)

## End(Not run)