Description Usage Arguments Value Examples
View source: R/pmut.base.prep.R
This function takes meta information generated by pmut.base.find
, prepares new data so that it can be scored without error.
It conducts a few things: it handles missing value imputation either by assigning to base level (categorical) or mean value (numeric),
note that a new column marking the imputed numeric entry is generated in the mean time;
it assigns levels not found in meta but observed in new data to base level;
it handles levels found in meta but not observed in new data by treating the column as factor
;
it handles entire column found in meta but not observed in new data by imputing the entire column with its base or mean;
it attaches symbol "!" with every base level; lastly, it orders the columns alphabetically.
Note that data processed by this function will only have two classes: factor
for categorical, numeric
for numeric.
Then model.matrix
will produce data matrix with exactly identical format as training,
so that it can be scored for a glmnet
or xgboost
model.
1 | pmut.base.prep(DATA, CatMeta, NumMeta)
|
DATA |
Object of class |
CatMeta |
List of meta information for categorical features generated by |
NumMeta |
List of meta information for numeric features generated by |
A data.frame
or data.table
ready to be scored
1 2 3 4 5 6 7 8 9 10 11 | temp = pmut.base.find(data.frame(ggplot2::diamonds))
# remove two columns
newdata = data.frame(ggplot2::diamonds)[,-c(2,6)]
# generate na
newdata$price[5:15] = NA
# assign new color
newdata$color = "NEW"
# temp[[1]] categorical meta, temp[[2]] numeric meta
newdata = pmut.base.prep(newdata, temp[[1]], temp[[2]])
head(newdata)
sapply(newdata, class)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.