glm_ensemble: Fit a GLM Ensemble model

Description Usage Arguments Value

View source: R/glm.R

Description

Fit a GLM Ensemble model via [stats]glm in parallel. Training datasets are chosen to have equal number of each class; and, a single dataset is used to determine prediction error (and ensemble weight) for each element in the ensemble. Each GLM does variable selection via the [stats]step function with non-verbose output.

Usage

1
2
3
4
glm_ensemble(df, dep_var, cols = which(names(df) != dep_var), n = 100L,
  level = NULL, major_class_wt = 1, seed = 379L, test_pct = 0.33,
  direction = "backward", family = binomial(link = "logit"),
  leave_cores = NULL)

Arguments

df

A data.frame for analysis

dep_var

A character string denoting the dependent variable in df.

cols

A vector of column indices corresponding the the variables you wish to regress on. This allows for variable (de)selection prior to model building. Defaults to using all columns.

n

An integer denoting the number of ensembles to build; defaults to 100L.

level

level of interest. If NULL takes the 2nd level of a factor variable or the 2nd unique value from a non-factor variable.

major_class_wt

Controls the number of major class cases selected in each partition as a multiple of the number of minority class observations. Defaults to 1, which will produce equal sized sets of minority and non-minority class in each partition. Must be greater than or equal to 1.

seed

An integer. Seed for reproducibility; defaults to 379L.

test_pct

A number in (0,1) specifying the size of the test dataset as a percentage.

direction

A character vector for the step process.

family

Used to specify the details of the glm methods. See [stats]family

leave_cores

An integer for number of cores to leave unused.

Value

A list of with a matrix of coefficients from each ensemble element, the element weights, and the weighted coefficient estimates.


alexWhitworth/glmEnsemble documentation built on Nov. 5, 2021, 6:55 a.m.