Description Usage Arguments Value Author(s) Examples
View source: R/MSE_Test_File.R
Implementation of a test which permutes trees between two forests, one with var
left intact, and one with var
replaced with a permuted version of itself, where the permutation is done row-wise.
1 2 3 4 5 6 7 8 9 | MSE_Test(X, y, X.test = FALSE, y.test = FALSE, var,
NTest = nrow(X.test), B = 1000, NTree = 500, p = 1/2,
base.learner = "rpart", mtry = ncol(X), importance = T,
alpha = if (base.learner == "lm") 1,
glm_cv = if (base.learner == "lm") "external" else "none",
lambda = if (glm_cv == "none" & base.learner == "lm") 1 else NULL,
ranger = F)
MSE_Test(formula, data, ...)
|
X |
Data frame of covariates - the training data. |
y |
Response vector. Currently only numeric responses (regression) are supported. |
X.test |
Covariates of the test set with which the MSE is calculated. |
y.test |
Responses in the test set with which the MSE is calculated. |
base.learner |
One of |
NTree |
Number of base learners. |
mtry |
|
var |
Variable of interest. Should correspond to the name of a variable in both |
NTest |
If |
B |
Number of permutations to use in the test. Note: this is the number of times the trees are permuted between forests to generate the permutation distribution, not the number of times each feature is permuted. |
p |
Fractional exponent of sample size, i.e. k = n^p observations are drawn. |
base.learner |
One of |
importance |
Logical. Should the standardized score of the test statistic (its "importance") be returned? |
form |
A |
alpha |
Mixing parameter if |
glm_cv |
Should internal cross validation be performed on each Elastic Net model? |
lambda |
Regularization parameter if |
ranger |
If |
An object of the S4 class MSE_Test
var |
Variable whose importance was tested, a name of a column in |
originalStat |
A named vector of two quantities, |
PermDiffs |
A vector of the differences in permuted MSEs - these make up the permutation distribution. |
Importance |
A scalar of the SD Importance Z-score. |
Pvalue |
The p-value for the hypothesis tested. |
test_pts |
The test data frame. |
weak_learner |
The base models used in the ensemble. |
model_original |
The full model ensemble - list of base learners, like in |
model_permuted |
The reduced model ensemble - list of base learners, like in |
test_stat |
Which test statistic is used. Will always be |
Tim Coleman
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | N <- 1250
Nvar <- 10
N_test <- 150
name_vec <- paste("X", 1:(2*Nvar), sep = "")
# training data:
X <- data.frame(replicate(Nvar, runif(N)),
replicate(Nvar, cut(runif(N), 3,
labels = as.character(1:3)))) %>%
mutate(Y = 5*(X3) + .5*X2^2 + ifelse(X6 > 10*X1*X8*X9, 1, 0) + rnorm(N, sd = .05))
names(X) <- c(name_vec, "Y")
# some testing data:
X.t1 <- data.frame(replicate(Nvar, runif(N_test)),
replicate(Nvar, cut(runif(N_test), 3,
labels = as.character(1:3)))) %>%
mutate(Y = 5*(X3) + .5*X2^2 + ifelse(X6 > 10*X1*X8*X9, 1, 0) + rnorm(N_test, sd = .05))
names(X.t1) <- c(name_vec, "Y")
# Not specifying test points:
M_no_test <- MSE_Test(X = X %>% dplyr::select(-Y), y = X$Y,
base.learner = "lm", NTest = 100, NTree = 150, B = 1000, var = c( "X3"),
p = .85, glm_cv = T)
summary(M_no_test)
# Specifying test points:
M_test <- MSE_Test(X = X %>% select(-Y), y = X$Y, X.test = X.t1 %>% select(-Y), y.test = X.t1$Y,
base.learner = "ctree", NTree = 250, B = 1000, var = c( "X2"),
p = .85)
summary(M_test)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.