MSE_Test.default: Comparing Test MSE's for Full and Reduced Models

Description Usage Arguments Value Author(s) Examples

View source: R/MSE_Test_File.R

Description

Implementation of a test which permutes trees between two forests, one with var left intact, and one with var replaced with a permuted version of itself, where the permutation is done row-wise.

Usage

1
2
3
4
5
6
7
8
9
MSE_Test(X, y, X.test = FALSE, y.test = FALSE, var,
  NTest = nrow(X.test), B = 1000, NTree = 500, p = 1/2,
  base.learner = "rpart", mtry = ncol(X), importance = T,
  alpha = if (base.learner == "lm") 1,
  glm_cv = if (base.learner == "lm") "external" else "none",
  lambda = if (glm_cv == "none" & base.learner == "lm") 1 else NULL,
  ranger = F)

MSE_Test(formula, data, ...)

Arguments

X

Data frame of covariates - the training data.

y

Response vector. Currently only numeric responses (regression) are supported.

X.test

Covariates of the test set with which the MSE is calculated.

y.test

Responses in the test set with which the MSE is calculated.

base.learner

One of "rpart", "ctree", "rtree", or "lm". Base model to be used in the bagging.

NTree

Number of base learners.

mtry

"mtry" parameter associated with random forest models.

var

Variable of interest. Should correspond to the name of a variable in both X and X.test.

NTest

If X.test, y.test are not specified, this number of test points are drawn at random from X, y to serve as a test set.

B

Number of permutations to use in the test. Note: this is the number of times the trees are permuted between forests to generate the permutation distribution, not the number of times each feature is permuted.

p

Fractional exponent of sample size, i.e. k = n^p observations are drawn.

base.learner

One of "rpart", "ctree", "rtree", or "lm". Base model to be used in the bagging.

importance

Logical. Should the standardized score of the test statistic (its "importance") be returned?

form

A "formula" object - no need to provide this by default.

alpha

Mixing parameter if base.learner = "lm" is chosen, quantifies amount between LASSO and Ridge penalties.

glm_cv

Should internal cross validation be performed on each Elastic Net model?

lambda

Regularization parameter if base.learner = "lm" is chosen.

ranger

If base.learner = "rtree" or base.learner = "ctree", should the models be ranger objects or randomForest objects (if rtree is chosen) or cforest objects (if ctree is chosen.)

Value

An object of the S4 class MSE_Test

var

Variable whose importance was tested, a name of a column in X.

originalStat

A named vector of two quantities, Original MSE, which corresponds to the MSE of the full model and Permuted MSE which corresponds to MSE of the reduced model.

PermDiffs

A vector of the differences in permuted MSEs - these make up the permutation distribution.

Importance

A scalar of the SD Importance Z-score.

Pvalue

The p-value for the hypothesis tested.

test_pts

The test data frame.

weak_learner

The base models used in the ensemble.

model_original

The full model ensemble - list of base learners, like in bag.s.

model_permuted

The reduced model ensemble - list of base learners, like in bag.s.

test_stat

Which test statistic is used. Will always be "MSE" for this function.

Author(s)

Tim Coleman

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
N <- 1250
Nvar <- 10
N_test <- 150
name_vec <- paste("X", 1:(2*Nvar), sep = "")

# training data:
X <- data.frame(replicate(Nvar, runif(N)),
                replicate(Nvar, cut(runif(N), 3,
                                      labels = as.character(1:3)))) %>%
  mutate(Y = 5*(X3) + .5*X2^2 + ifelse(X6 > 10*X1*X8*X9, 1, 0) +  rnorm(N, sd = .05))
names(X) <- c(name_vec, "Y")

# some testing data:
X.t1 <- data.frame(replicate(Nvar, runif(N_test)),
                   replicate(Nvar, cut(runif(N_test), 3,
                                       labels = as.character(1:3)))) %>%
  mutate(Y = 5*(X3) + .5*X2^2 + ifelse(X6 > 10*X1*X8*X9, 1, 0) +  rnorm(N_test, sd = .05))
names(X.t1) <- c(name_vec, "Y")

# Not specifying test points:
M_no_test <- MSE_Test(X = X %>% dplyr::select(-Y), y = X$Y,
                      base.learner = "lm", NTest = 100, NTree = 150, B = 1000, var = c( "X3"),
                      p = .85, glm_cv = T)

summary(M_no_test)

# Specifying test points:
M_test <- MSE_Test(X = X %>% select(-Y), y = X$Y, X.test = X.t1 %>% select(-Y), y.test = X.t1$Y,
                      base.learner = "ctree", NTree = 250, B = 1000, var = c( "X2"),
                      p = .85)
summary(M_test)

tim-coleman/RFtest documentation built on March 10, 2020, 12:28 p.m.