cca_splithalf: Split-Half CCA code

Description Usage Arguments Details Value

View source: R/analysis_functions.R

Description

Run CCA model in training dataset, and and validate performance in testing dataset. The function estimates confidence intervals and p-values using standard inferential methods of Pearson's correlation coefficient.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cca_splithalf(
  X_FIT,
  Y_FIT,
  X_PRED,
  Y_PRED,
  ProcrustX = NULL,
  ProcrustY = NULL,
  ncomp = NULL,
  alpha = 0.05
)

Arguments

X_FIT

Numeric Matrix or Data Frame [N, P1] containing the training dataset predictor variables.

Y_FIT

Numeric Matrix or Data Frame [N, P2] containing the training dataset outcome variables.

X_PRED

Numeric Matrix or Data Frame [N, P1] containing the testing dataset predictor variables. Variables should be ordered in the same way as for X_FIT.

Y_PRED

Numeric Matrix or Data Frame [N, P1] containing the testing dataset outcome variables. Variables should be ordered in the same way as for Y_FIT.

ProcrustX

Numeric Matrix [ncomp, P1] containing target matrix for Procrustes Analysis. Will align raw coefficient matrix obtained from X_FIT to ProcrustX target matrix. This is then used when fitting the cca model to X_PRED.

ProcrustY

Numeric Matrix [ncomp, P2] containing target matrix for Procrustes Analysis. Will align raw coefficient matrix obtained from Y_FIT to ProcrustY target matrix. This is then used when fitting the cca model to Y_PRED.

ncomp

Numeric Scalar. Number of CCA components to keep in analyses. Must be equal to or less than min(P1,P2).

alpha

Numeric Scalar. Alpha level for estimating a 100(1-alpha)% confidence interval for each canonical correlation. Default is .05 for estimating a 95% confidence interval.

Details

The function also calculates the variance explained (see R2_Matrix) for each outcome variable when running separate linear regression models using the predictor canonical variates estimated from X_FIT (& Y_FIT). The number of canonical variates used in the regression model is altered from 1-all to examine how R2 increases when adding a new variate. T

The function also calculates the variance explained (R2) for each outcome variance when using a linear regression model to predict each outcome variable using the CCA variates. The number of CCA variates used in each linear model is altered to exmaine its impact on the total variance explained.

Two versions of the same algorithm are used. The first, producing R2_matrix as output, uses the predicted CCA variates in the testing dataset as input to linear regression models to predict each outcome. Therefore, this leads to bias in R2, especially when the sample size is small (hence why some use the adjusted R2 metric), and should be avoided unless very large sample sizes are used.

The second version of the algorithm, which produces R2_matrix_unbiased as output, runs the linear regression models also in the training dataset. Therefore to get the predicted outcome scores, we multiply the testing dataset predictor scores (X_PRED) to the a raw coefficients (xcoef) and then to the linear regression coefficients (beta).

Value

A list containing the following components


giac01/ccatools documentation built on July 15, 2021, 4:33 a.m.