global.significance: Global significance test

View source: R/global.significance.R

global.significanceR Documentation

Global significance test

Description

This function runs a permutation test to evaluates the global effect of subject-related covariates (Z). Returns an estimated p-value.

Usage

global.significance(
  X,
  Y,
  Z,
  ntree = 200,
  mtry = NULL,
  nperm = 500,
  nodesize = NULL,
  nodedepth = NULL,
  nsplit = 10,
  Xcenter = TRUE,
  Ycenter = TRUE
)

Arguments

X

The first multivariate data set which has n observations and px variables. A data.frame of numeric values.

Y

The second multivariate data set which has n observations and py variables. A data.frame of numeric values.

Z

The set of subject-related covariates which has n observations and pz variables. Used in random forest growing. A data.frame with numeric values and factors.

ntree

Number of trees.

mtry

Number of z-variables randomly selected as candidates for splitting a node. The default is pz/3 where pz is the number of z variables. Values are always rounded up.

nperm

Number of permutations.

nodesize

Forest average number of unique data points in a terminal node. The default is the 3 * (px+py) where px and py are the number of x and y variables, respectively.

nodedepth

Maximum depth to which a tree should be grown. In the default, this parameter is ignored.

nsplit

Non-negative integer value for the number of random splits to consider for each candidate splitting variable. When zero or NULL, all possible splits considered.

Xcenter

Should the columns of X be centered? The default is TRUE.

Ycenter

Should the columns of Y be centered? The default is TRUE.

Value

An object of class (rfcca,globalsignificance) which is a list with the following components:

call

The original call to global.significance.

pvalue

p-value, see below for details.

n

Sample size of the data (NA's are omitted).

ntree

Number of trees grown.

nperm

Number of permutations.

mtry

Number of variables randomly selected for splitting at each node.

nodesize

Minimum forest average number of unique data points in a terminal node.

nodedepth

Maximum depth to which a tree is allowed to be grown.

nsplit

Number of randomly selected split points.

xvar

Data frame of x-variables.

xvar.names

A character vector of the x-variable names.

yvar

Data frame of y-variables.

yvar.names

A character vector of the y-variable names.

zvar

Data frame of z-variables.

zvar.names

A character vector of the z-variable names.

predicted.oob

OOB predicted canonical correlations for training observations based on the selected final canonical correlation estimation method.

predicted.perm

Predicted canonical correlations for the permutations. A matrix of predictions with observations on the rows and permutations on the columns.

Details

We perform a hypothesis test to evaluate the global effect of the subject-related covariates on distinguishing between canonical correlations. Define the unconditional canonical correlation between X and Y as \rho_{CCA}(X,Y) which is found by computing CCA with all X and Y, and the conditional canonical correlation between X and Y given Z as \rho(X,Y | Z) which is found by rfcca(). If there is a global effect of Z on correlations between X and Y, \rho(X,Y | Z) should be significantly different from \rho_{CCA}(X,Y). We conduct a permutation test for the null hypothesis

H_0 : \rho(X,Y | Z) = \rho_{CCA}(X,Y)

We estimate a p-value with the permutation test. If the p-value is less than the pre-specified significance level \alpha, we reject the null hypothesis.

See Also

rfcca predict.rfcca print.rfcca

Examples


## load generated example data
data(data, package = "RFCCA")
set.seed(2345)

global.significance(X = data$X, Y = data$Y, Z = data$Z, ntree = 40,
  nperm = 5)



RFCCA documentation built on Sept. 19, 2023, 9:06 a.m.