The R class types of response variables play a central role in their analysis with the package. They determine, for example, the specific models that can be fit, fitting algorithms employed, predicted values produced, and applicable performance metrics and analyses. As described in the following sections, factors, ordered factors, numeric vectors and matrices, and survival objects are supported by the package.
Categorical responses with two or more levels should be coded as a factor
variable for analysis. Prediction is of factor levels by default and of level-specific probabilities if type = "prob"
.
## Iris flowers species (3-level factor) model_fit <- fit(Species ~ ., data = iris, model = GBMModel) predict(model_fit) %>% head predict(model_fit, type = "prob") %>% head
In the case of a binary factor, the second factor level is treated as the event and the level for which predicted probabilities are computed.
## Pima Indians diabetes statuses (binary factor) data(Pima.te, package = "MASS") data(Pima.tr, package = "MASS") model_fit <- fit(type ~ ., data = Pima.tr, model = GBMModel) predict(model_fit, newdata = Pima.te) %>% head predict(model_fit, newdata = Pima.te, type = "prob") %>% head
Categorical responses can be designated as having ordered levels by storing them as an ordered
factor variable. For categorical vectors, this can be accomplished with the factor()
function and its argument ordered = TRUE
or more simply with the ordered()
function. Numeric vectors can be converted to ordered factors with the cut()
function.
## Iowa City housing prices (ordered factor) df <- within(ICHomes, { sale_amount <- cut(sale_amount, breaks = 3, labels = c("Low", "Medium", "High"), ordered_result = TRUE) }) model_fit <- fit(sale_amount ~ ., data = df, model = GBMModel) predict(model_fit) %>% head predict(model_fit, type = "prob") %>% head
Code univariate numerical responses as a numeric
variable. Predicted numeric values are of the original type (integer or double) by default, and doubles if type = "numeric"
.
## Iowa City housing prices model_fit <- fit(sale_amount ~ ., data = ICHomes, model = GBMModel) predict(model_fit) %>% head predict(model_fit, type = "numeric") %>% head
Store multivariate numerical responses as a numeric matrix
variable for model fitting with traditional formulas and model frames.
## Anscombe's multiple regression models dataset ## Numeric matrix response formula model_fit <- fit(cbind(y1, y2, y3) ~ x1, data = anscombe, model = LMModel) predict(model_fit) %>% head
For recipes, the multiple response may be defined on the left hand side of a recipe formula or as a single variable within a data frame.
## Numeric matrix response recipe ## Defined in a recipe formula rec <- recipe(y1 + y2 + y3 ~ x1, data = anscombe) ## Defined within a data frame df <- within(anscombe, { y <- cbind(y1, y2, y3) remove(y1, y2, y3) }) rec <- recipe(y ~ x1, data = df) model_fit <- fit(rec, model = LMModel) predict(model_fit) %>% head
Censored time-to-event survival responses should be stored as a Surv
variable for model fitting with traditional formulas and model frames.
## Survival response formula library(survival) fit(Surv(time, status) ~ ., data = veteran, model = GBMModel)
For recipes, survival responses may be defined with the individual survival time and event variables given on the left hand side of a recipe formula and their roles designated with the r rdoc_url("role_surv()", "recipe_roles")
function or as a single Surv
variable within a data frame.
## Survival response recipe ## Defined in a recipe formula rec <- recipe(time + status ~ ., data = veteran) %>% role_surv(time = time, event = status) ## Defined within a data frame df <- within(veteran, { y <- Surv(time, status) remove(time, status) }) rec <- recipe(y ~ ., data = df) fit(rec, model = GBMModel)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.