glm_ensemble: Fit a GLM Ensemble model
In alexWhitworth/glmEnsemble: Builds GLM ensemble models in parallel

Description Usage Arguments Value

View source: R/glm.R

Fit a GLM Ensemble model via [stats]glm in parallel. Training datasets are chosen to have equal number of each class; and, a single dataset is used to determine prediction error (and ensemble weight) for each element in the ensemble. Each GLM does variable selection via the [stats]step function with non-verbose output.

glm_ensemble(df, dep_var, cols = which(names(df) != dep_var), n = 100L,
  level = NULL, major_class_wt = 1, seed = 379L, test_pct = 0.33,
  direction = "backward", family = binomial(link = "logit"),
  leave_cores = NULL)

`df`	A `data.frame` for analysis
`dep_var`	A character string denoting the dependent variable in `df`.
`cols`	A vector of column indices corresponding the the variables you wish to regress on. This allows for variable (de)selection prior to model building. Defaults to using all columns.
`n`	An integer denoting the number of ensembles to build; defaults to `100L`.
`level`	level of interest. If `NULL` takes the 2nd level of a factor variable or the 2nd unique value from a non-factor variable.
`major_class_wt`	Controls the number of major class cases selected in each partition as a multiple of the number of minority class observations. Defaults to `1`, which will produce equal sized sets of minority and non-minority class in each partition. Must be greater than or equal to 1.
`seed`	An integer. Seed for reproducibility; defaults to `379L`.
`test_pct`	A number in (0,1) specifying the size of the test dataset as a percentage.
`direction`	A character vector for the step process.
`family`	Used to specify the details of the glm methods. See `[stats]family`
`leave_cores`	An integer for number of cores to leave unused.