pmut.base.prep: Process Data to be Socred with Meta Information
In chengjunhou/pmut: Predictive Modeling Utility-function Toolkit

Description Usage Arguments Value Examples

View source: R/pmut.base.prep.R

This function takes meta information generated by pmut.base.find, prepares new data so that it can be scored without error. It conducts a few things: it handles missing value imputation either by assigning to base level (categorical) or mean value (numeric), note that a new column marking the imputed numeric entry is generated in the mean time; it assigns levels not found in meta but observed in new data to base level; it handles levels found in meta but not observed in new data by treating the column as factor; it handles entire column found in meta but not observed in new data by imputing the entire column with its base or mean; it attaches symbol "!" with every base level; lastly, it orders the columns alphabetically. Note that data processed by this function will only have two classes: factor for categorical, numeric for numeric. Then model.matrix will produce data matrix with exactly identical format as training, so that it can be scored for a glmnet or xgboost model.

1	pmut.base.prep(DATA, CatMeta, NumMeta)

`DATA`	Object of class `data.frame` or `data.table`
`CatMeta`	List of meta information for categorical features generated by `pmut.base.find`
`NumMeta`	List of meta information for numeric features generated by `pmut.base.find`

A data.frame or data.table ready to be scored

temp = pmut.base.find(data.frame(ggplot2::diamonds))
# remove two columns
newdata = data.frame(ggplot2::diamonds)[,-c(2,6)]
# generate na
newdata$price[5:15] = NA
# assign new color
newdata$color = "NEW"
# temp[[1]] categorical meta, temp[[2]] numeric meta
newdata = pmut.base.prep(newdata, temp[[1]], temp[[2]])
head(newdata)
sapply(newdata, class)