# rfcca: Random Forest with Canonical Correlation Analysis In RFCCA: Random Forest with Canonical Correlation Analysis

 rfcca R Documentation

## Random Forest with Canonical Correlation Analysis

### Description

Estimates the canonical correlations between two sets of variables depending on the subject-related covariates.

### Usage

```rfcca(
X,
Y,
Z,
ntree = 200,
mtry = NULL,
nodesize = NULL,
nodedepth = NULL,
nsplit = 10,
importance = FALSE,
finalcca = c("cca", "scca", "rcca"),
bootstrap = TRUE,
samptype = c("swor", "swr"),
sampsize = if (samptype == "swor") function(x) {     x * 0.632 } else function(x) {
x },
forest = TRUE,
membership = FALSE,
bop = TRUE,
Xcenter = TRUE,
Ycenter = TRUE,
...
)
```

### Arguments

 `X` The first multivariate data set which has n observations and px variables. A data.frame of numeric values. `Y` The second multivariate data set which has n observations and py variables. A data.frame of numeric values. `Z` The set of subject-related covariates which has n observations and pz variables. Used in random forest growing. A data.frame with numeric values and factors. `ntree` Number of trees. `mtry` Number of z-variables randomly selected as candidates for splitting a node. The default is pz/3 where pz is the number of z variables. Values are always rounded up. `nodesize` Forest average number of unique data points in a terminal node. The default is the 3 * (px+py) where px and py are the number of x and y variables, respectively. `nodedepth` Maximum depth to which a tree should be grown. In the default, this parameter is ignored. `nsplit` Non-negative integer value for the number of random splits to consider for each candidate splitting variable. When zero or `NULL`, all possible splits considered. `importance` Should variable importance of z-variables be assessed? The default is `FALSE`. `finalcca` Which CCA should be used for final canonical correlation estimation? Choices are `cca`, `scca` and `rcca`, see below for details. The default is `cca`. `bootstrap` Should the data be bootstrapped? The default value is `TRUE` which bootstraps the data by sampling without replacement. If `FALSE` is chosen, the data is not bootstrapped. It is not possible to return OOB predictions and variable importance measures if `FALSE` is chosen. `samptype` Type of bootstrap. Choices are `swor` (sampling without replacement/sub-sampling) and `swr` (sampling with replacement/ bootstrapping). The default action here (as in `randomForestSRC`) is sampling without replacement. `sampsize` Size of sample to draw. For sampling without replacement, by default it is .632 times the sample size. For sampling with replacement, it is the sample size. `forest` Should the forest object be returned? It is used for prediction on new data. The default is `TRUE`. `membership` Should terminal node membership and inbag information be returned? `bop` Should the Bag of Observations for Prediction (BOP) for training observations be returned? The default is `TRUE`. `Xcenter` Should the columns of X be centered? The default is `TRUE`. `Ycenter` Should the columns of Y be centered? The default is `TRUE`. `...` Optional arguments to be passed to other methods.

### Value

An object of class `(rfcca,grow)` which is a list with the following components:

 `call` The original call to `rfcca`. `n` Sample size of the data (`NA`'s are omitted). `ntree` Number of trees grown. `mtry` Number of variables randomly selected for splitting at each node. `nodesize` Minimum forest average number of unique data points in a terminal node. `nodedepth` Maximum depth to which a tree is allowed to be grown. `nsplit` Number of randomly selected split points. `xvar` Data frame of x-variables. `xvar.names` A character vector of the x-variable names. `yvar` Data frame of y-variables. `yvar.names` A character vector of the y-variable names. `zvar` Data frame of z-variables. `zvar.names` A character vector of the z-variable names. `leaf.count` Number of terminal nodes for each tree in the forest. Vector of length `ntree`. `bootstrap` Was the data bootstrapped? `forest` If `forest=TRUE`, the `rfcca` forest object is returned. This object is used for prediction with new data. `membership` A matrix recording terminal node membership where each cell represents the node number that an observations falls in for that tree. `importance` Variable importance measures (VIMP) for each z-variable. `inbag` A matrix recording inbag membership where each cell represents whether the observation is in the bootstrap sample in the corresponding tree. `predicted.oob` OOB predicted canonical correlations for training observations based on the selected final canonical correlation estimation method. `predicted.coef` Predicted canonical weight vectors for x- and y- variables. `bop` If `bop=TRUE`, a list containing BOP for each training observation is returned. `finalcca` The selected CCA used for final canonical correlation estimations. `rfsrc.grow` An object of class `(rfsrc,grow)` is returned. This object is used for prediction with training or new data.

### Details

Final canonical correlation estimation:

Final canonical correlation can be computed with CCA (Hotelling, 1936), Sparse CCA (Witten et al., 2009) or Regularized CCA (Vinod,1976; Leurgans et al., 1993). If Regularized CCA will be used, λ_1 and λ_2 should be specified.

### References

Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3/4), 321–377.

Leurgans, S. E., Moyeed, R. A., & Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society: Series B (Methodological), 55(3), 725-740.

Vinod, H.D. (1976). Canonical ridge and econometrics of joint production. Journal of econometrics, 4(2), 147–166.

Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515-534.

`predict.rfcca` `global.significance` `vimp.rfcca` `print.rfcca`

### Examples

```
data(data, package = "RFCCA")
set.seed(2345)

## define train/test split
smp <- sample(1:nrow(data\$X), size = round(nrow(data\$X) * 0.7),
replace = FALSE)
train.data <- lapply(data, function(x) {x[smp, ]})
test.Z <- data\$Z[-smp, ]

## train rfcca
rfcca.obj <- rfcca(X = train.data\$X, Y = train.data\$Y, Z = train.data\$Z,
ntree = 100, importance = TRUE)

## print the grow object
print(rfcca.obj)

## get the OOB predictions
pred.oob <- rfcca.obj\$predicted.oob

## predict with new test data
pred.obj <- predict(rfcca.obj, newdata = test.Z)
pred <- pred.obj\$predicted

## get the variable importance measures
z.vimp <- rfcca.obj\$importance

## train rfcca and estimate the final canonical correlations with "scca"
rfcca.obj2 <- rfcca(X = train.data\$X, Y = train.data\$Y, Z = train.data\$Z,
ntree = 100, finalcca = "scca")

```

RFCCA documentation built on April 13, 2022, 9:06 a.m.