screen.randomForest.imp: "Best of both worlds" Random Forest screening algorithm
In saraemoore/SLScreenExtra: A Collection of Additional Feature Selection Algorithms and Utilities for SuperLearner

screen.randomForest.imp

R Documentation

"Best of both worlds" Random Forest screening algorithm

Description

Customizability of screen.randomForest combined with the cutoff selectors of FSelector.

Usage

screen.randomForest.imp(
  Y,
  X,
  family,
  obsWeights,
  id,
  selector = c("cutoff.biggest.diff", "cutoff.k", "cutoff.k.percent"),
  k = switch(selector, cutoff.k = ceiling(0.5 * ncol(X)), cutoff.k.percent = 0.5, NULL),
  nTree = 1000,
  mTry = ifelse(family$family == "gaussian", floor(sqrt(ncol(X))), max(floor(ncol(X)/3),
    1)),
  nodeSize = ifelse(family$family == "gaussian", 5, 1),
  importanceType = c("permutation", "impurity"),
  maxNodes = NULL,
  verbose = FALSE,
  ...
)

Arguments

`Y`	Outcome (numeric vector). See `SuperLearner` for specifics.
`X`	Predictor variable(s) (data.frame or matrix). See `SuperLearner` for specifics.
`family`	Error distribution to be used in the model: `gaussian` or `binomial`. Currently unused. See `SuperLearner` for specifics.
`obsWeights`	Optional numeric vector of observation weights. Currently unused.
`id`	Cluster identification variable. Currently unused.
`selector`	A string corresponding to a subset selecting function implemented in the FSelector package. One of: `cutoff.biggest.diff` (default), `cutoff.k`, or `cutoff.k.percent`.
`k`	Passed through to the `selector` in the case where `selector` is `cutoff.k` or `cutoff.k.percent`. Otherwise, should remain NULL (the default). For `cutoff.k`, this is an integer indicating the number of features to keep from `X`. For `cutoff.k.percent`, this is instead the proportion of features to keep.
`nTree`	Integer. Number of trees. Default: 1000.
`mTry`	Integer. Number of columns of `X` sampled at each split. Default: square root (`gaussian()` family) or one third (`binomial()` family) of total number of features, rounded down.
`nodeSize`	Integer. Minimum number of observations in terminal nodes. Default: 5 (`gaussian()` family) or 1 (`binomial()` family).
`importanceType`	Importance type. `"permutation"` (default) indicates mean decrease in accuracy (for `binomial()` family) or percent increase in mean squared error (for `gaussian()` family) when comparing predictions using the original variable versus a permuted version of the variable (column of `X`). `"impurity"` indicates increase in node purity achieved by splitting on that column of `X` (for `binomial()` family, measured by Gini index; for `gaussian()`, measured by residual sum of squares). See `randomForest` for more details, where `"permutation"` corresponds to `type = 1` and `"impurity"` corresponds to `type = 2`.
`maxNodes`	Maximum number of terminal nodes allowed in a tree. Default (`NULL`) indicates that trees should be grown to maximum possible size. See `randomForest` for more details.
`verbose`	Should debugging messages be printed? Default: `FALSE`.
`...`	Currently unused.

Value

A logical vector with length equal to ncol(X).

Examples

data(iris)
Y <- as.numeric(iris$Species=="setosa")
X <- iris[,-which(colnames(iris)=="Species")]
screen.randomForest.imp(Y, X, binomial(), selector = "cutoff.k.percent", k = 0.75)

data(mtcars)
Y <- mtcars$mpg
X <- mtcars[,-which(colnames(mtcars)=="mpg")]
screen.randomForest.imp(Y, X, gaussian(), importanceType = "impurity")

# based on examples in SuperLearner package
set.seed(1)
n <- 100
p <- 20
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)

library(SuperLearner)
sl = SuperLearner(Y, X, family = gaussian(), cvControl = list(V = 2),
                  SL.library = list(c("SL.glm", "All"),
                                    c("SL.glm", "screen.randomForest.imp")))
sl
sl$whichScreen

saraemoore/SLScreenExtra documentation built on Nov. 4, 2023, 9:31 p.m.

saraemoore/SLScreenExtra index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

saraemoore/SLScreenExtra
A Collection of Additional Feature Selection Algorithms and Utilities for SuperLearner

screen.randomForest.imp: "Best of both worlds" Random Forest screening algorithm
In saraemoore/SLScreenExtra: A Collection of Additional Feature Selection Algorithms and Utilities for SuperLearner

"Best of both worlds" Random Forest screening algorithm

Description

Usage

Arguments

Value

Examples

Related to screen.randomForest.imp in saraemoore/SLScreenExtra...

R Package Documentation

Browse R Packages

We want your feedback!

saraemoore/SLScreenExtra A Collection of Additional Feature Selection Algorithms and Utilities for SuperLearner

screen.randomForest.imp: "Best of both worlds" Random Forest screening algorithm In saraemoore/SLScreenExtra: A Collection of Additional Feature Selection Algorithms and Utilities for SuperLearner

"Best of both worlds" Random Forest screening algorithm

Description

Usage

Arguments

Value

Examples

Related to screen.randomForest.imp in saraemoore/SLScreenExtra...

R Package Documentation

Browse R Packages

We want your feedback!

saraemoore/SLScreenExtra
A Collection of Additional Feature Selection Algorithms and Utilities for SuperLearner

screen.randomForest.imp: "Best of both worlds" Random Forest screening algorithm
In saraemoore/SLScreenExtra: A Collection of Additional Feature Selection Algorithms and Utilities for SuperLearner