Description Usage Arguments Value Examples
View source: R/pmut.base.prep.R
This function takes meta information generated by pmut.base.find, prepares new data so that it can be scored without error.
It conducts a few things: it handles missing value imputation either by assigning to base level (categorical) or mean value (numeric),
note that a new column marking the imputed numeric entry is generated in the mean time;
it assigns levels not found in meta but observed in new data to base level;
it handles levels found in meta but not observed in new data by treating the column as factor;
it handles entire column found in meta but not observed in new data by imputing the entire column with its base or mean;
it attaches symbol "!" with every base level; lastly, it orders the columns alphabetically.
Note that data processed by this function will only have two classes: factor for categorical, numeric for numeric.
Then model.matrix will produce data matrix with exactly identical format as training,
so that it can be scored for a glmnet or xgboost model.
1  | pmut.base.prep(DATA, CatMeta, NumMeta)
 | 
DATA | 
 Object of class   | 
CatMeta | 
 List of meta information for categorical features generated by   | 
NumMeta | 
 List of meta information for numeric features generated by   | 
A data.frame or data.table ready to be scored
1 2 3 4 5 6 7 8 9 10 11  | temp = pmut.base.find(data.frame(ggplot2::diamonds))
# remove two columns
newdata = data.frame(ggplot2::diamonds)[,-c(2,6)]
# generate na
newdata$price[5:15] = NA
# assign new color
newdata$color = "NEW"
# temp[[1]] categorical meta, temp[[2]] numeric meta
newdata = pmut.base.prep(newdata, temp[[1]], temp[[2]])
head(newdata)
sapply(newdata, class)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.