screen.rf.exact: Random forest screener limits selected variables and selects...

Description Usage Arguments Details Super Learner See Also Examples

Description

Random forest screener for SuperLearner() that selects specified individual variables and specified overall number of variables.

Usage

1
2
3
4
screen.rf.exact(Y, X, family, nVar = 10, nFix = 3,
  fixed.var.index = var.index, ntree = 500, mtry = ifelse(family$family ==
  "gaussian", floor(sqrt(ncol(X))), max(floor(ncol(X)/3), 1)),
  nodesize = ifelse(family$family == "gaussian", 5, 1), ...)

Arguments

Y

outcome variable (specified in SuperLearner())

X

data frame

nVar

number of variables for the screener to select

nFix

number of individual variables that are alaways passed to SuperLearner()

var.index

indices of variables to always be included by the screener

Details

This function can be pretty slow, because currently it operates by searching the rankings for the user selected ("fixed") variables. If the fixed variables are included in the top nVar then it does not change anything. If the fixed variables are not included in the top nVar, then it selects a subset of top nVar; e.g., the overall number of variables to select is 10 and 2 of the fixed variables are outside the top 10, it will select the top 8, and convert the 2 fixed variables outside the top 10 to be TRUE.

Super Learner

See SuperLearner() documentation for information on additional arguments and instructions on implementing SuperLearner().

See Also

screen.glmnet.fix for lasso screener

Examples

1
2
3
4
5
If you do not know the indices of the variables you always want to include, 
 you can get them from the variable name, where newdat is the dataframe name:
 
 var.index <- c(which(colnames(newdat)=="sex"), which(colnames(newdat)=="age"), 
               which(colnames(newdat)=="emp_active")) 

sl-bergquist/SLscreeners documentation built on Dec. 2, 2019, 1:29 a.m.