ensemble_glmnet: Bagged Ensemble GLMNET Model.

Description Usage Arguments Value See Also Examples

Description

This function creates a bagged ensemble generalized linear model via penalized maximum likelihood given a dataset. This function effictively acts as a wrapper for the glmnet package. The GLMNET model is a very versitile regression model that incorporates feature selection using shrinkage parametres and cross validation.

Usage

1
2
3
4
5
6
ensemble_glmnet(y_index, train, valid_size = NULL, test = NULL, alpha = 1,
  family = c("gaussian", "binomial", "poisson", "multinomial", "cox",
  "mgaussian"), type = c("link", "response", "coefficients", "nonzero",
  "class"), n = 10, nfolds = 10, r = NULL, r_replace = FALSE,
  c = NULL, c_replace = FALSE, standardize = TRUE, plots = FALSE,
  seed = TRUE)

Arguments

y_index

A column index representing the response variable of the model.

train

A dataset for the GLMNET model to be trained on. The order and names of train set should be the exact same as the test set.

valid_size

A natural number indicating the number of observations to be randomly sampled from the training data for model validation.

test

A dataset for the GLMNET model to predict for. The order and names of test set should be the exact same as the train set.

alpha

The elasticnet mixing parameter, a numeric value between 0 and 1. When alpha is 1, a LASSO model is fitted. When alpha is 0, a Ridge Regression model is fitted. When alpha is not 0 or 1, an Elastic Nets model is fitted. Default is 1.

family

A character object indicating the type of response variable in the model. Either one of; "gaussian", "binomial", "poisson", "multinomial", "cox" or "mgaussian". Default is gaussian.

type

The type of prediction required. Either one of; "link", "response", "coefficients", "nonzero" or "class". Default is "link"

n

A natural number indicating the number of GLMNET models to be built.

nfolds

The number of cross-vaidate folds to perform. Default is 10.

r

The number of rows to be bagged. Note r < nrow(train).

r_replace

A logical object allow resampling when bagging rows. Default is FALSE.

c

The number of columns to be bagged Note c < ncol(train)

c_replace

A logical object allowing resampling when bagging columns Default is FALSE.

standardize

A logical object indicating whether the predictor variables X should be standardised. Default is TRUE.

plots

A logical object indicating whether plots should be constructed for each bagged model.

seed

Logical, indicating whether a random seed should be implemented.

file_name

A character object indicating the file name when saving the data frame. The default is NULL. The name must include the .csv suffixs.

directory

A character object specifying the directory where the data frame is to be saved as a .csv file.

Value

Outputs a list of information related to the ensemble GLMNET model. The first object of the list is a data frame of the response observations, the corresponding predictions and the error associated with the prediction. The second object of the list is a data frame of model performance metrics. The third object of the list is a vector of predictions / classifications for the specified test set.

See Also

ensemble_mars, ensemble_mlr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# Example 1
# Numeric prediction example with Iris Data
data <- iris[sample(1:150, size = 150, replace = FALSE),]
# subset training data
train_data <- data[1:150,-5]
# bag rows only
ensemble_glmnet(y_index = 1, train = train_data, valid_size = 50, n = 10, r = 80, alpha = 0.5, family = "gaussian", plots = T)
# bag columns only
ensemble_glmnet(y_index = 1, train = train_data, valid_size = 50, n = 10, c = 2)
# bagg rows and columns
ensemble_glmnet(y_index = 1, train = train_data, valid_size = 50, n = 10, r = 80, c = 2)

# Example 2
# Binomial classication example with irisData
data <- iris[sample(1:150, size = 150, replace = FALSE),]
# Dummy encode the Species
data <- derive_variables(dataset = data, type = "dummy", integer = TRUE, return_dataset = TRUE)
# Convert the response variable into a binary factor with two class
data$Species_setosa <- as.factor(data$Species_setosa)
# Extract the test data
test <- data[101:50,c(1,2,3,4,6,7)]
# move Species_setosa to the front of the data frame
data <- data[,c(5,1,2,3,4,6,7)]
# fit a LASSO model with no bagging
ensemble_glmnet(y_index = 1, train = data, valid_size = 50, n = 10, alpha = 1, family = "binomial", type = "class")
# fit a Ridge Regression model with bagged rows
ensemble_glmnet(y_index = 1, train = data, valid_size = 50, n = 20, r = 80, alpha = 0, family = "binomial", type = "class", plots = FALSE)
# fit an Elastic Nets model to predict the test data with no validation
ensemble_glmnet(y_index = 1, train = data, valid_size = NULL, test = test, n = 10, alpha = 1, family = "binomial", type = "class")

# Example 3
# Validation set with Multinomial Class prediction example using iris
data <- iris
data <- data[sample(1:150, size = 150, replace = FALSE),c(5,1,2,3,4)]
# plots
ense <- glmnet(y = as.vector(data[,1]), x = as.matrix(data[,-1]), alpha = 1, family = "binomial")
plot(ense)
ense <- cv.glmnet(y = as.vector(data[,1]), nfolds = 10, x = as.matrix(data[,-1]), alpha = 1, family = "binomial")
plot(ense)
# raw calculations
predict.cv.glmnet(object = cv.glmnet(y = as.vector(data[1:100,1]), x = as.matrix(data[1:100,-1]), alpha = 1, family = "multinomial"), newx = as.matrix(data[101:50,-1]), type = "class")

# Example 4
# Test set with numeric prediction using titanic
descriptive_statistics(dataset = titanic)
str(titanic)
data <- titanic[,c(6,2,3,7,8,10)]
data <- data[order(data$Age),]
data$Age
train <- data[1:714, ]
valid_size <- 50
test <- data[715:891, ]
ensemble_glmnet(y_index = 1, train = train, valid = valid_size, test = test, n = 10, r = 600, c = 4)
ensemble_glmnet(y_index = 1, train = train, valid = valid_size, n = 10)

# Example 5 
# Possion Prediction with IrisData
counts = rpois(n = 150, lambda = 3)
data <- iris[sample(1:150, 150, FALSE), ]
data = cbind(counts, data)
train <- data[1:100,]
# test <- data[101:150,]
test <- data[101:150, -1]
ensemble_glmnet(y_index = 1, train = train, test = test, valid_size = 50, family = "poisson", type = "response")

oislen/BuenaVista documentation built on May 16, 2019, 8:12 p.m.