pmut.base.prep: Process Data to be Socred with Meta Information

Description Usage Arguments Value Examples

View source: R/pmut.base.prep.R

Description

This function takes meta information generated by pmut.base.find, prepares new data so that it can be scored without error. It conducts a few things: it handles missing value imputation either by assigning to base level (categorical) or mean value (numeric), note that a new column marking the imputed numeric entry is generated in the mean time; it assigns levels not found in meta but observed in new data to base level; it handles levels found in meta but not observed in new data by treating the column as factor; it handles entire column found in meta but not observed in new data by imputing the entire column with its base or mean; it attaches symbol "!" with every base level; lastly, it orders the columns alphabetically. Note that data processed by this function will only have two classes: factor for categorical, numeric for numeric. Then model.matrix will produce data matrix with exactly identical format as training, so that it can be scored for a glmnet or xgboost model.

Usage

1
pmut.base.prep(DATA, CatMeta, NumMeta)

Arguments

DATA

Object of class data.frame or data.table

CatMeta

List of meta information for categorical features generated by pmut.base.find

NumMeta

List of meta information for numeric features generated by pmut.base.find

Value

A data.frame or data.table ready to be scored

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
temp = pmut.base.find(data.frame(ggplot2::diamonds))
# remove two columns
newdata = data.frame(ggplot2::diamonds)[,-c(2,6)]
# generate na
newdata$price[5:15] = NA
# assign new color
newdata$color = "NEW"
# temp[[1]] categorical meta, temp[[2]] numeric meta
newdata = pmut.base.prep(newdata, temp[[1]], temp[[2]])
head(newdata)
sapply(newdata, class)

chengjunhou/pmut documentation built on May 23, 2019, 4:24 p.m.