model_selection: model_selection

Description Usage Arguments Value Examples

View source: R/RegressionModelPipeline.R

Description

Main function for model selection from a set of many variables

Usage

1
2
3
4
5
model_selection(df, observations, response, family = "gaussian",
  model = glm, interactions = FALSE, test = c("Wald", "LRT"),
  thresh_screen = 0.2, only_return_selected = FALSE, K = 10,
  sig_vars_thresh = NULL, robust = FALSE, N = 1, aic_k = 2,
  robust_n = 100, alpha = 1)

Arguments

df,

a data.frame containing response and observations variables. Factors with more than 2 levels have only been implimented for test='LRT'

observations,

a character vector of the names of independent/observations variables in df

response,

a character vector of the names of dependent/response variables in df

family,

a character string indicating the family associated with the submitted model c('gaussian','binomial','poisson'...)

model,

a model associated for testing the variables c(glm,lm)

interactions,

a boolean indicating if interactions should be assessed. Default is NULL, by default, interactions will be examined according to the constraints set by sig_vars_thresh. If a value is set for interactions (T/F) this will override the recomendations of sig_vars_thresh

test,

a character string indicating Likelihood Ratio Test ('LRT') testing likelihood improvement of a model or Wald test ('Wald') testing coefficient > 0

thresh_screen,

a numeric value indicating the p-value cutoff for the univariate screening

only_return_selected,

a boolean value. If true, only models with p-value less than the threshold will be returned. Otherwise, all models will be returned.

K,

a numeric value indicating the number of folds to use for k-fold cross-validation. K=10 by default. K=0 to skip k-fold validation.

sig_vars_thresh

a list specifying the maximal number of significant variables allowed for each final model generating method. NULL (self initializing) by default.

robust

boolean indicating if regularization will be run multiple times to get a robust indication of the underlying structure

N,

a numeric value, default N=1, indicating the number of cross validation iterations to perform

robust_n,

number of iterations for the robust glmnet run

alpha,

alpha parameter if glmnet is used for a regularization method

Value

a list containing: univariate models, the final selected model, and crossvalidation stats.

Examples

1
2
3
4
mod=model_selection(df=mtcars,colnames(mtcars)[-1],response = 'mpg',interactions=F,test='LRT',K=5,family = 'gaussian',model=glm)
out=vis(mod)
print(out[[1]])
print(out[[2]])

LewisLabUCSD/RegressionModelPipeline documentation built on Jan. 11, 2021, 10:33 p.m.