BSstack: Bootstrap Stacking model builder.

Description Usage Arguments Details Value Examples

View source: R/BSstack.R

Description

Creates a bootstrapped linear stacked set of Random Forest (RF) models given a set of heterogeneous datasets.

Usage

1
2
BSstack(T = 50, mtry = NULL, nodesize = 5, iter = 25, CV = NA,
  Xn = NULL, ECHO = TRUE, Y, X1, X2, ...)

Arguments

T

Number of trees for the individual RF models. (int)

mtry

Number of variables available for splitting at each tree node. If a scalar is given then all models use the given values. If a 1D array is given then each individual model uses the given value. If NA then for each model it will be set to Nfeats/3

nodesize

Minimum size of terminal nodes. If a scalar is given then all models use the given values. If a 1D array is given then each individual model uses the given value. By default all models use 5.

iter

The number of time to bootstrap sample the data. (int)

CV

Cross validation (CV) to measure mean-absolute error and correlation coefficient, if NA (default) no CV is performed. Otherwise the value gives the number of folds for CV. If CV<2 then leave-one-out CV is performed. CV is performed utilizing the samples that have full record.

Xn

List containing each dataset to be stacked. If not supplied will be generated from X1, X2, ...

ECHO

Bool, enable to provide output to the user in terms of overlapping samples and runtime for CV.

Y

Nsample x 1 data table of responses for ALL samples. Must have matching rownames with each individual dataset.

X1

Data table of first dataset to be stacked. Rownames should be contained within Y.

X2

Data table of second dataset to be stacked. Rownames should be contained within Y.

...

Further data tables, X3, X4, ..., Xl.

Details

Required Packages: dplyr, randomForest, foreach

Value

If CV != null : A list composed of: [1] List containing [1] individual RF models, [2] Nstack +1 weights and [3] feature names for full record samples. This argument is what is used for BSstack_predict [2] Mean-absolute error calculated using cross validation (scalar). [3] Pearson correlation coefficient between actual and predicted values through cross validation (scalar -1<=r<=1). [4] Individual weights calculate for each fold (CV x Nstack+1 matrix). [5] Out of fold predictions for the overlaping samples. [6] Actual values for the overlaping samples. If CV > 1 : Also [7] The fold assignments for the overlapping samples. If CV = null : Only [1] is returned.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
library(Sstack)
library(doParallel)
data(StackData)

AUC=StackData[[1]]
GE=StackData[[2]]
RPPA=StackData[[3]]

X1 <- GE[1:400,1:75]
X2 <- GE[200:400,76:150]
Xt <- GE[401:487,]

set.seed(1)

cl <- makeCluster(2)
registerDoParallel(cl)

Hbs <- BSstack(T = 25, iter = 20, Y = AUC, X1 = X1, X2 = X2)

stopCluster(cl)

Yp <- BSstack_predict(Hbs[[1]],Xt)

maeH1 <- mean(abs(AUC[401:487,]-Yp[,1]))
maeH2 <- mean(abs(AUC[401:487,]-Yp[,2]))
maeHs <- mean(abs(AUC[401:487,]-Yp[,3]))

Sstack documentation built on May 2, 2019, 5:39 a.m.