ForwardModel.Res: NeRI-based feature selection procedure for linear, logistic,...

Description Usage Arguments Value Author(s) See Also

Description

This function performs a bootstrap sampling to rank the most frequent variables that statistically aid the models by minimizing the residuals. After the frequency rank, the function uses a forward selection procedure to create a final model, whose terms all have a significant contribution to the net residual improvement (NeRI).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
	ForwardSelection.Model.Res(size = 100, 
	                     fraction = 1, 
	                     pvalue = 0.05, 
	                     loops = 100, 
	                     covariates = "1", 
	                     Outcome, 
	                     variableList, 
	                     data, 
	                     maxTrainModelSize = 20, 
	                     type = c("LM", "LOGIT", "COX"), 
	                     testType=c("Binomial", "Wilcox", "tStudent", "Ftest"),
	                     timeOutcome = "Time",
	                     cores = 6,
	                     randsize = 0,
	                     featureSize=0)

Arguments

size

The number of candidate variables to be tested (the first size variables from variableList)

fraction

The fraction of data (sampled with replacement) to be used as train

pvalue

The maximum p-value, associated to the NeRI, allowed for a term in the model (controls the false selection rate)

loops

The number of bootstrap loops

covariates

A string of the type "1 + var1 + var2" that defines which variables will always be included in the models (as covariates)

Outcome

The name of the column in data that stores the variable to be predicted by the model

variableList

A data frame with two columns. The first one must have the names of the candidate variables and the other one the description of such variables

data

A data frame where all variables are stored in different columns

maxTrainModelSize

Maximum number of terms that can be included in the model

type

Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")

testType

Type of non-parametric test to be evaluated by the improvedResiduals function: Binomial test ("Binomial"), Wilcoxon rank-sum test ("Wilcox"), Student's t-test ("tStudent"), or F-test ("Ftest")

timeOutcome

The name of the column in data that stores the time to event (needed only for a Cox proportional hazards regression model fitting)

cores

Cores to be used for parallel processing

randsize

the model size of a random outcome. If randsize is less than zero. It will estimate the size

featureSize

The original number of features to be explored in the data frame.

Value

final.model

An object of class lm, glm, or coxph containing the final model

var.names

A vector with the names of the features that were included in the final model

formula

An object of class formula with the formula used to fit the final model

ranked.var

An array with the ranked frequencies of the features

formula.list

A list containing objects of class formula with the formulas used to fit the models found at each cycle

variableList

A list of variables used in the forward selection

Author(s)

Jose G. Tamez-Pena and Antonio Martinez-Torteya

See Also

ForwardSelection.Model.Bin


FRESA.CAD documentation built on Jan. 13, 2021, 3:39 p.m.