global.significance: Global significance test
In RFCCA: Random Forest with Canonical Correlation Analysis

View source: R/global.significance.R

global.significance

R Documentation

Global significance test

Description

This function runs a permutation test to evaluates the global effect of subject-related covariates (Z). Returns an estimated p-value.

Usage

global.significance(
  X,
  Y,
  Z,
  ntree = 200,
  mtry = NULL,
  nperm = 500,
  nodesize = NULL,
  nodedepth = NULL,
  nsplit = 10,
  Xcenter = TRUE,
  Ycenter = TRUE
)

Arguments

`X`	The first multivariate data set which has `n` observations and `px` variables. A data.frame of numeric values.
`Y`	The second multivariate data set which has `n` observations and `py` variables. A data.frame of numeric values.
`Z`	The set of subject-related covariates which has `n` observations and `pz` variables. Used in random forest growing. A data.frame with numeric values and factors.
`ntree`	Number of trees.
`mtry`	Number of z-variables randomly selected as candidates for splitting a node. The default is `pz/3` where `pz` is the number of z variables. Values are always rounded up.
`nperm`	Number of permutations.
`nodesize`	Forest average number of unique data points in a terminal node. The default is the `3 * (px+py)` where `px` and `py` are the number of x and y variables, respectively.
`nodedepth`	Maximum depth to which a tree should be grown. In the default, this parameter is ignored.
`nsplit`	Non-negative integer value for the number of random splits to consider for each candidate splitting variable. When zero or `NULL`, all possible splits considered.
`Xcenter`	Should the columns of X be centered? The default is `TRUE`.
`Ycenter`	Should the columns of Y be centered? The default is `TRUE`.

Value

An object of class (rfcca,globalsignificance) which is a list with the following components:

`call`	The original call to `global.significance`.
`pvalue`	p-value, see below for details.
`n`	Sample size of the data (`NA`'s are omitted).
`ntree`	Number of trees grown.
`nperm`	Number of permutations.
`mtry`	Number of variables randomly selected for splitting at each node.
`nodesize`	Minimum forest average number of unique data points in a terminal node.
`nodedepth`	Maximum depth to which a tree is allowed to be grown.
`nsplit`	Number of randomly selected split points.
`xvar`	Data frame of x-variables.
`xvar.names`	A character vector of the x-variable names.
`yvar`	Data frame of y-variables.
`yvar.names`	A character vector of the y-variable names.
`zvar`	Data frame of z-variables.
`zvar.names`	A character vector of the z-variable names.
`predicted.oob`	OOB predicted canonical correlations for training observations based on the selected final canonical correlation estimation method.
`predicted.perm`	Predicted canonical correlations for the permutations. A matrix of predictions with observations on the rows and permutations on the columns.

Details

We perform a hypothesis test to evaluate the global effect of the subject-related covariates on distinguishing between canonical correlations. Define the unconditional canonical correlation between X and Y as \rho_{CCA}(X,Y) which is found by computing CCA with all X and Y, and the conditional canonical correlation between X and Y given Z as \rho(X,Y | Z) which is found by rfcca(). If there is a global effect of Z on correlations between X and Y, \rho(X,Y | Z) should be significantly different from \rho_{CCA}(X,Y). We conduct a permutation test for the null hypothesis

H_0 : \rho(X,Y | Z) = \rho_{CCA}(X,Y)

We estimate a p-value with the permutation test. If the p-value is less than the pre-specified significance level \alpha, we reject the null hypothesis.

Examples


## load generated example data
data(data, package = "RFCCA")
set.seed(2345)

global.significance(X = data$X, Y = data$Y, Z = data$Z, ntree = 40,
  nperm = 5)

RFCCA documentation built on May 29, 2024, 6:06 a.m.