Description Usage Arguments Value See Also Examples
This function creates a bagged ensemble multivariate adaptive regression splines (MARS) model given a dataset. This function effictively acts as a wrapper for the earth package. The MARS model is a very versitile regression model that incorporates feature selection using shrinkage parametres and cross validation.
1 2 3 4 5 6 | ensemble_mars(y_index, train, valid_size = NULL, test = NULL,
pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"),
family = c("gaussian", "binomial", "poisson"), type = c("link",
"response", "earth", "class", "terms", "poisson"), degree = 1,
nprune = NULL, nfold = 0, n = 10, r = NULL, r_replace = FALSE,
c = NULL, c_replace = FALSE, plots = FALSE, seed = TRUE)
|
y_index |
A column index representing the response variable of the model. |
train |
A dataset for the MARS model to be trained on. The order and names of train set should be the exact same as the test set. |
valid_size |
A natural number indicating the number of observations to be randomly sampled from the training data for model validation. |
test |
A dataset for the GLMNET model to predict for. The order and names of test set should be the exact same as the train set. |
pmethod |
Pruning method. One of: "backward", "none", "exhaustive", "forward", "seqrep" or "cv". Default is "backward". |
family |
A character object indicating the type of response variable in the model. Either one of; "gaussian", "binomial", "poisson". Default is gaussian. |
type |
The type of prediction required. Either one of; "link", "response", "earth", "class" or "terms". Default is "link" |
degree |
An optional integer specifying maximum interaction degree (default is 1). Default is 1. |
nprune |
an optional integer specifying the maximum number of model terms. Default is NULL. |
nfold |
Number of cross-validation folds. Default is 0. |
n |
A natural number indicating the number of GLMNET models to be built. |
r |
The number of rows to be bagged. Note r < nrow(train). |
r_replace |
A logical object allow resampling when bagging rows. Default is FALSE. |
c |
The number of columns to be bagged Note c < ncol(train) |
c_replace |
A logical object allowing resampling when bagging columns Default is FALSE. |
plots |
A logical object indicating whether plots should be constructed for each bagged model. |
seed |
Logical, indicating whether a random seed should be implemented. |
file_name |
A character object indicating the file name when saving the data frame. The default is NULL. The name must include the .csv suffixs. |
directory |
A character object specifying the directory where the data frame is to be saved as a .csv file. |
Outputs a list of information related to the ensemble GLMNET model. The first object of the list is a data frame of the response observations, the corresponding predictions and the error associated with the prediction. The second object of the list is a data frame of model performance metrics. The third object of the list is a vector of predictions / classifications for the specified test set.
ensemble_mars, ensemble_mlr
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | # Example 1
# set data
data <- iris[sample(1:150, 150, FALSE), ]
train <- data[1:100,]
test <- data[101:150, -1]
ensemble_mars(y_index = 1, train = train, test = test, valid_size = 50, family = "gaussian")
# Example 2
# Binomial classication example with irisData
data <- iris[sample(1:150, size = 150, replace = FALSE),]
# Dummy encode the Species
data <- derive_variables(dataset = data, type = "dummy", integer = TRUE, return_dataset = TRUE)
# Convert the response variable into a binary factor with two class
data$Species_setosa <- as.factor(data$Species_setosa)
# Extract the test data
test <- data[101:50,c(5,1,2,3,4,6,7)]
# move Species_setosa to the front of the data frame
# data <- data[,c(5,1,2,3,4,6,7)]
data <- data[,c(5,1,2,3,4,6,7)]
# fit a MARS model with no bagging
ensemble_mars(y_index = 1, train = data, test = test, valid_size = 50, family = "binomial", type = "class")
# Example 3
# Possion Prediction with IrisData
counts = rpois(n = 150, lambda = 3)
data <- iris[sample(1:150, 150, FALSE), ]
data = cbind(counts, data)
train <- data[1:100,]
# test <- data[101:150,]
test <- data[101:150, -1]
ensemble_mars(y_index = 1, train = train, test = test, valid_size = 50, family = "poisson", type = "response")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.