ensemble_mars: Bagged Ensemble MARS Model.

Description Usage Arguments Value See Also Examples

Description

This function creates a bagged ensemble multivariate adaptive regression splines (MARS) model given a dataset. This function effictively acts as a wrapper for the earth package. The MARS model is a very versitile regression model that incorporates feature selection using shrinkage parametres and cross validation.

Usage

1
2
3
4
5
6
ensemble_mars(y_index, train, valid_size = NULL, test = NULL,
  pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"),
  family = c("gaussian", "binomial", "poisson"), type = c("link",
  "response", "earth", "class", "terms", "poisson"), degree = 1,
  nprune = NULL, nfold = 0, n = 10, r = NULL, r_replace = FALSE,
  c = NULL, c_replace = FALSE, plots = FALSE, seed = TRUE)

Arguments

y_index

A column index representing the response variable of the model.

train

A dataset for the MARS model to be trained on. The order and names of train set should be the exact same as the test set.

valid_size

A natural number indicating the number of observations to be randomly sampled from the training data for model validation.

test

A dataset for the GLMNET model to predict for. The order and names of test set should be the exact same as the train set.

pmethod

Pruning method. One of: "backward", "none", "exhaustive", "forward", "seqrep" or "cv". Default is "backward".

family

A character object indicating the type of response variable in the model. Either one of; "gaussian", "binomial", "poisson". Default is gaussian.

type

The type of prediction required. Either one of; "link", "response", "earth", "class" or "terms". Default is "link"

degree

An optional integer specifying maximum interaction degree (default is 1). Default is 1.

nprune

an optional integer specifying the maximum number of model terms. Default is NULL.

nfold

Number of cross-validation folds. Default is 0.

n

A natural number indicating the number of GLMNET models to be built.

r

The number of rows to be bagged. Note r < nrow(train).

r_replace

A logical object allow resampling when bagging rows. Default is FALSE.

c

The number of columns to be bagged Note c < ncol(train)

c_replace

A logical object allowing resampling when bagging columns Default is FALSE.

plots

A logical object indicating whether plots should be constructed for each bagged model.

seed

Logical, indicating whether a random seed should be implemented.

file_name

A character object indicating the file name when saving the data frame. The default is NULL. The name must include the .csv suffixs.

directory

A character object specifying the directory where the data frame is to be saved as a .csv file.

Value

Outputs a list of information related to the ensemble GLMNET model. The first object of the list is a data frame of the response observations, the corresponding predictions and the error associated with the prediction. The second object of the list is a data frame of model performance metrics. The third object of the list is a vector of predictions / classifications for the specified test set.

See Also

ensemble_mars, ensemble_mlr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Example 1
# set data
data <- iris[sample(1:150, 150, FALSE), ]
train <- data[1:100,]
test <- data[101:150, -1]
ensemble_mars(y_index = 1, train = train, test = test, valid_size = 50, family = "gaussian")

# Example 2
# Binomial classication example with irisData
data <- iris[sample(1:150, size = 150, replace = FALSE),]
# Dummy encode the Species
data <- derive_variables(dataset = data, type = "dummy", integer = TRUE, return_dataset = TRUE)
# Convert the response variable into a binary factor with two class
data$Species_setosa <- as.factor(data$Species_setosa)
# Extract the test data
test <- data[101:50,c(5,1,2,3,4,6,7)]
# move Species_setosa to the front of the data frame
# data <- data[,c(5,1,2,3,4,6,7)]
data <- data[,c(5,1,2,3,4,6,7)]
# fit a MARS model with no bagging
ensemble_mars(y_index = 1, train = data, test = test, valid_size = 50, family = "binomial", type = "class")

# Example 3  
# Possion Prediction with IrisData
counts = rpois(n = 150, lambda = 3)
data <- iris[sample(1:150, 150, FALSE), ]
data = cbind(counts, data)
train <- data[1:100,]
# test <- data[101:150,]
test <- data[101:150, -1]
ensemble_mars(y_index = 1, train = train, test = test, valid_size = 50, family = "poisson", type = "response")

oislen/BuenaVista documentation built on May 16, 2019, 8:12 p.m.