rfcca  R Documentation 
Estimates the canonical correlations between two sets of variables depending on the subjectrelated covariates.
rfcca( X, Y, Z, ntree = 200, mtry = NULL, nodesize = NULL, nodedepth = NULL, nsplit = 10, importance = FALSE, finalcca = c("cca", "scca", "rcca"), bootstrap = TRUE, samptype = c("swor", "swr"), sampsize = if (samptype == "swor") function(x) { x * 0.632 } else function(x) { x }, forest = TRUE, membership = FALSE, bop = TRUE, Xcenter = TRUE, Ycenter = TRUE, ... )
X 
The first multivariate data set which has n observations and px variables. A data.frame of numeric values. 
Y 
The second multivariate data set which has n observations and py variables. A data.frame of numeric values. 
Z 
The set of subjectrelated covariates which has n observations and pz variables. Used in random forest growing. A data.frame with numeric values and factors. 
ntree 
Number of trees. 
mtry 
Number of zvariables randomly selected as candidates for splitting a node. The default is pz/3 where pz is the number of z variables. Values are always rounded up. 
nodesize 
Forest average number of unique data points in a terminal node. The default is the 3 * (px+py) where px and py are the number of x and y variables, respectively. 
nodedepth 
Maximum depth to which a tree should be grown. In the default, this parameter is ignored. 
nsplit 
Nonnegative integer value for the number of random splits to
consider for each candidate splitting variable. When zero or 
importance 
Should variable importance of zvariables be assessed? The
default is 
finalcca 
Which CCA should be used for final canonical correlation
estimation? Choices are 
bootstrap 
Should the data be bootstrapped? The default value is

samptype 
Type of bootstrap. Choices are 
sampsize 
Size of sample to draw. For sampling without replacement, by default it is .632 times the sample size. For sampling with replacement, it is the sample size. 
forest 
Should the forest object be returned? It is used for prediction
on new data. The default is 
membership 
Should terminal node membership and inbag information be returned? 
bop 
Should the Bag of Observations for Prediction (BOP) for training
observations be returned? The default is 
Xcenter 
Should the columns of X be centered? The default is

Ycenter 
Should the columns of Y be centered? The default is

... 
Optional arguments to be passed to other methods. 
An object of class (rfcca,grow)
which is a list with the
following components:
call 
The original call to 
n 
Sample size of the data ( 
ntree 
Number of trees grown. 
mtry 
Number of variables randomly selected for splitting at each node. 
nodesize 
Minimum forest average number of unique data points in a terminal node. 
nodedepth 
Maximum depth to which a tree is allowed to be grown. 
nsplit 
Number of randomly selected split points. 
xvar 
Data frame of xvariables. 
xvar.names 
A character vector of the xvariable names. 
yvar 
Data frame of yvariables. 
yvar.names 
A character vector of the yvariable names. 
zvar 
Data frame of zvariables. 
zvar.names 
A character vector of the zvariable names. 
leaf.count 
Number of terminal nodes for each tree in the forest.
Vector of length 
bootstrap 
Was the data bootstrapped? 
forest 
If 
membership 
A matrix recording terminal node membership where each cell represents the node number that an observations falls in for that tree. 
importance 
Variable importance measures (VIMP) for each zvariable. 
inbag 
A matrix recording inbag membership where each cell represents whether the observation is in the bootstrap sample in the corresponding tree. 
predicted.oob 
OOB predicted canonical correlations for training observations based on the selected final canonical correlation estimation method. 
predicted.coef 
Predicted canonical weight vectors for x and y variables. 
bop 
If 
finalcca 
The selected CCA used for final canonical correlation estimations. 
rfsrc.grow 
An object of class 
Final canonical correlation can be computed with CCA (Hotelling, 1936), Sparse CCA (Witten et al., 2009) or Regularized CCA (Vinod,1976; Leurgans et al., 1993). If Regularized CCA will be used, λ_1 and λ_2 should be specified.
Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3/4), 321–377.
Leurgans, S. E., Moyeed, R. A., & Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society: Series B (Methodological), 55(3), 725740.
Vinod, H.D. (1976). Canonical ridge and econometrics of joint production. Journal of econometrics, 4(2), 147–166.
Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515534.
predict.rfcca
global.significance
vimp.rfcca
print.rfcca
## load generated example data data(data, package = "RFCCA") set.seed(2345) ## define train/test split smp < sample(1:nrow(data$X), size = round(nrow(data$X) * 0.7), replace = FALSE) train.data < lapply(data, function(x) {x[smp, ]}) test.Z < data$Z[smp, ] ## train rfcca rfcca.obj < rfcca(X = train.data$X, Y = train.data$Y, Z = train.data$Z, ntree = 100, importance = TRUE) ## print the grow object print(rfcca.obj) ## get the OOB predictions pred.oob < rfcca.obj$predicted.oob ## predict with new test data pred.obj < predict(rfcca.obj, newdata = test.Z) pred < pred.obj$predicted ## get the variable importance measures z.vimp < rfcca.obj$importance ## train rfcca and estimate the final canonical correlations with "scca" rfcca.obj2 < rfcca(X = train.data$X, Y = train.data$Y, Z = train.data$Z, ntree = 100, finalcca = "scca")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.