screen.earth: Non-linear regression screening algorithm

screen.earthR Documentation

Non-linear regression screening algorithm

Description

Performs feature selection via "Multivariate Adaptive Regression Splines"/ "Fast MARS" using earth's implementation.

Usage

screen.earth(
  Y,
  X,
  family,
  obsWeights,
  id,
  selector = c("cutoff.biggest.diff", "cutoff.k", "cutoff.k.percent"),
  k = switch(selector, cutoff.k = ceiling(0.5 * ncol(X)), cutoff.k.percent = 0.5, NULL),
  importanceType = c("nsubsets", "rss", "gcv"),
  degree = 2,
  penalty = 3,
  kForward = max(21, 2 * ncol(X) + 1),
  pMethod = "cv",
  nFold = 5,
  ...
)

Arguments

Y

Outcome (numeric vector). See SuperLearner for specifics.

X

Predictor variable(s) (data.frame or matrix). See SuperLearner for specifics.

family

Error distribution to be used in the model: gaussian or binomial. Currently unused. See SuperLearner for specifics.

obsWeights

Passed on via earth's weights argument. Expect slower computation time if obsWeights are provided.

id

Cluster identification variable. Currently unused.

selector

A string corresponding to a subset selecting function implemented in the FSelector package. One of: cutoff.biggest.diff (default), cutoff.k, or cutoff.k.percent.

k

Passed through to the selector in the case where selector is cutoff.k or cutoff.k.percent. Otherwise, should remain NULL (the default). For cutoff.k, this is an integer indicating the number of features to keep from X. For cutoff.k.percent, this is instead the proportion of features to keep.

importanceType

Variable importance criterion. One of: "nsubsets" ("number of subsets"), "rss", or "gcv".

degree

Maximum degree of interaction. Default: 2. 1 would indicate no interaction terms should be used.

penalty

Generalized Cross Validation (GCV) penalty per knot. Default: 3.

kForward

Maximum number of terms created by the forward pass (including the intercept). Default: twice the number of features (in X) plus one OR 21 – whichever is greater.

pMethod

Pruning method. Default: "cv": select the number of terms yielding the maximum mean out-of-fold R-Squared over the cross-validated model fits (CVRSq). See earth for other possible values.

nFold

Number of cross-validation folds. Must be >0 if pmethod = "cv". Default: 5.

...

Currently unused.

Value

A logical vector with length equal to ncol(X).

Examples

data(iris)
Y <- as.numeric(iris$Species=="setosa")
X <- iris[,-which(colnames(iris)=="Species")]
screen.earth(Y, X, binomial(), selector = "cutoff.k.percent", k = 0.75)

data(mtcars)
Y <- mtcars$mpg
X <- mtcars[,-which(colnames(mtcars)=="mpg")]
screen.earth(Y, X, gaussian(), importanceType = "rss")

# based on examples in SuperLearner package
set.seed(1)
n <- 250
p <- 20
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)

library(SuperLearner)
sl = SuperLearner(Y, X, family = gaussian(), cvControl = list(V = 2),
                  SL.library = list(c("SL.glm", "All"),
                                    c("SL.glm", "screen.earth")))
sl
sl$whichScreen

saraemoore/SLScreenExtra documentation built on Nov. 4, 2023, 9:31 p.m.