cca_splithalf: Split-Half CCA code
In giac01/ccatools: Tools for Canonical Correlation Analysis

Description Usage Arguments Details Value

View source: R/analysis_functions.R

Run CCA model in training dataset, and and validate performance in testing dataset. The function estimates confidence intervals and p-values using standard inferential methods of Pearson's correlation coefficient.

cca_splithalf(
  X_FIT,
  Y_FIT,
  X_PRED,
  Y_PRED,
  ProcrustX = NULL,
  ProcrustY = NULL,
  ncomp = NULL,
  alpha = 0.05
)

`X_FIT`	Numeric Matrix or Data Frame [N, P1] containing the training dataset predictor variables.
`Y_FIT`	Numeric Matrix or Data Frame [N, P2] containing the training dataset outcome variables.
`X_PRED`	Numeric Matrix or Data Frame [N, P1] containing the testing dataset predictor variables. Variables should be ordered in the same way as for X_FIT.
`Y_PRED`	Numeric Matrix or Data Frame [N, P1] containing the testing dataset outcome variables. Variables should be ordered in the same way as for Y_FIT.
`ProcrustX`	Numeric Matrix [ncomp, P1] containing target matrix for Procrustes Analysis. Will align raw coefficient matrix obtained from X_FIT to ProcrustX target matrix. This is then used when fitting the cca model to X_PRED.
`ProcrustY`	Numeric Matrix [ncomp, P2] containing target matrix for Procrustes Analysis. Will align raw coefficient matrix obtained from Y_FIT to ProcrustY target matrix. This is then used when fitting the cca model to Y_PRED.
`ncomp`	Numeric Scalar. Number of CCA components to keep in analyses. Must be equal to or less than min(P1,P2).
`alpha`	Numeric Scalar. Alpha level for estimating a 100(1-alpha)% confidence interval for each canonical correlation. Default is .05 for estimating a 95% confidence interval.

The function also calculates the variance explained (see R2_Matrix) for each outcome variable when running separate linear regression models using the predictor canonical variates estimated from X_FIT (& Y_FIT). The number of canonical variates used in the regression model is altered from 1-all to examine how R2 increases when adding a new variate. T

The function also calculates the variance explained (R2) for each outcome variance when using a linear regression model to predict each outcome variable using the CCA variates. The number of CCA variates used in each linear model is altered to exmaine its impact on the total variance explained.

Two versions of the same algorithm are used. The first, producing R2_matrix as output, uses the predicted CCA variates in the testing dataset as input to linear regression models to predict each outcome. Therefore, this leads to bias in R2, especially when the sample size is small (hence why some use the adjusted R2 metric), and should be avoided unless very large sample sizes are used.

The second version of the algorithm, which produces R2_matrix_unbiased as output, runs the linear regression models also in the training dataset. Therefore to get the predicted outcome scores, we multiply the testing dataset predictor scores (X_PRED) to the a raw coefficients (xcoef) and then to the linear regression coefficients (beta).

A list containing the following components

model_results - Full output from the ccatools::.cca function used internallly.
predicted_cc - Predicted Canonical Correlations
confint_cc - Predicted Canonical Correlation Confidence Interval
pvalue_cc - P-value for Predicted Canonical Correlation
combined_cc - Table with Predicted Canonical Correlations, Confidence Intervals and P-values
R2_matrix - Matrix with outcome variables on rows, and columns indicating the variance explained (R2; estimated with coefficient of determination) for each outcome variable when using linear regression models to predict each outcome. The columns indicate how much variance can be explained in each outcome when the number of canonical variates extracted varies. The final column indicates how much variance can be explained by a simple linear regression model.
R2_matrix_BinaryOutcomes_Combined - Same as R2_matrix, but using logistic regression to predict binary outcomes. The same coefficient of determination is calculated using the response residuals and response sums of squares. Will return NULL if there are no binary outcomes. Note that this fucntion automatically recodes binary outcomes to 0/1 for logistic regression.