Variable specification defines the relationship between response and predictor variables as well as the data used to estimate the relationship. Four main types of specifications are supported by the fit
and resample
functions: traditional formula, design matrix, model frame, and preprocessing recipe.
Different variable specifications will be illustrated with the Iowa City home prices dataset from the package. To start, a formula is defined relating sale amount to home characteristics. Two equivalent definitions are given. One in which predictors on the right hand side of the equation are explicitly included, and another in which .
notation is used to indicate that all remaining variable not already in the model be included on the right hand side.
## Analysis library library(MachineShop) ## Iowa City home prices dataset str(ICHomes) ## Formula: explicit inclusion of predictor variables fo <- sale_amount ~ sale_year + sale_month + built + style + construction + base_size + add_size + garage1_size + garage2_size + lot_size + bedrooms + basement + ac + attic + lon + lat ## Formula: implicit inclusion of predictor (. = remaining) variables fo <- sale_amount ~ .
Traditional formula calls to the fitting functions consist of a formula and dataset pair. This specification additionally allows for crossing (*
), interaction (:
), and removal (-
) of predictors in the formula; in-line functions of response variables; and some in-line functions of predictors.
model_fit <- fit(fo, ICHomes, model = TunedModel(SVMRadialModel)) tuned_model <- as.MLModel(model_fit) model_res <- resample(fo, ICHomes, model = tuned_model) summary(model_res)
Support is provided for calls with a numeric design matrix and response object pair. The design matrix approach has lower computational overhead than the others and can thus enable a larger number of predictors to be included in an analysis.
x <- as.matrix(ICHomes[c("built", "base_size", "lot_size", "bedrooms")]) y <- ICHomes$sale_amount model_fit <- fit(x, y, model = TunedModel(SVMRadialModel)) tuned_model <- as.MLModel(model_fit) model_res <- resample(x, y, model = tuned_model) summary(model_res)
Model frames are created with the traditional formula or design matrix syntax described above and then passed to the fitting functions. They allow for the specification of variables for stratified resampling or for weighting of cases in model fitting.
mf <- ModelFrame(fo, data = ICHomes, strata = ICHomes$sale_amount) model_fit <- fit(mf, model = TunedModel(SVMRadialModel)) tuned_model <- as.MLModel(model_fit) model_res <- resample(mf, model = tuned_model) summary(model_res)
Preprocessing recipes provide a flexible framework for defining predictor and response variables as well as preprocessing steps to be applied to them prior to model fitting. Using recipes helps ensure that estimation of predictive performance accounts for all modeling step. As with model frames, the recipes approach allows for specification of case strata and weights.
library(recipes) rec <- recipe(fo, data = ICHomes) %>% role_case(stratum = sale_amount) %>% step_center(base_size, add_size, garage1_size, garage2_size, lot_size) %>% step_pca(base_size, add_size, garage1_size, garage2_size, lot_size, num_comp = 2) %>% step_dummy(all_nominal_predictors()) model_Fit <- fit(rec, model = TunedModel(SVMRadialModel)) tuned_model <- as.MLModel(model_fit) model_res <- resample(rec, model = tuned_model) summary(model_res)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.