cvselpscca: Cross validation for Sparse Canonical Correlation Analysis

Description Usage Arguments Details Value References See Also Examples

View source: R/cvselpscca.R

Description

Peforms nfolds cross validation to select optimal tuning parameters for SELPCCA based on training data. If you want to apply optimal tuning parameters to testing data, you may also use multiplescca.

Usage

1
2
3
cvselpscca(Xdata1=Xdata1,Xdata2=Xdata2,ncancorr=ncancorr,CovStructure="Iden",
          isParallel=TRUE,ncores=NULL,nfolds=5,ngrid=10,
          standardize=TRUE,thresh=0.0001,maxiteration=20)

Arguments

Xdata1

A matrix of size n \times p for first dataset. Rows are samples and columns are variables.

Xdata2

A matrix of size n \times q for second dataset. Rows are samples and columns are variables.

ncancorr

Number of canonical correlation vectors. Default is 1.

CovStructure

Covariance structure to use in estimating sparse canonical correlation vectors. Either "Iden" or "Ridge". Iden assumes the covariance matrix for each dataset is identity. Ridge uses the sample covariance for each dataset. See reference article for more details.

isParallel

TRUE or FALSE for parallel computing. Default is TRUE.

ncores

Number of cores to be used for parallel computing. Only used if isParallel=TRUE. If isParallel=TRUE and ncores=NULL, defaults to half the size of the number of system cores.

nfolds

Number of cross validation folds. Default is 5.

ngrid

Number of grid points for tuning parameters. Default is 10 for each dataset.

standardize

TRUE or FALSE. If TRUE, data will be normalized to have mean zero and variance one for each variable. Default is TRUE.

maxiteration

Maximum iteration for the algorithm if not converged. Default is 20.

thresh

Threshold for convergence. Default is 0.0001.

Details

The function will return several R objects, which can be assigned to a variable. To see the results, use the “$" operator.

Value

hatalpha

Estimated sparse canonical correlation vectors for first dataset.

hatbeta

Estimated sparse canonical correlation vectors for second dataset.

CovStructure

Covariance structure used in estimating sparse canonical correlation vectors. Either "Iden" or "Ridge".

optTau

Optimal tuning parameters for each dataset.

maxcorr

Estimated canonical correlation coefficient.

tunerange

Grid values for each dataset used for searching optimal tuning paramters.

References

Sandra E. Safo, Jeongyoun Ahn, Yongho Jeon, and Sungkyu Jung (2018) , Sparse Generalized Eigenvalue Problem with Application to Canonical Correlation Analysis for Integrative Analysis of Methylation and Gene Expression Data. Biometrics

See Also

multiplescca

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
library(SELPCCA)
##---- read in data
data(DataExample)

Xdata1=DataExample[[1]]
Xdata2=DataExample[[2]]


##---- call cross validation to estimate first canonical correlation vectors
ncancorr=1
mycv=cvselpscca(Xdata1=Xdata1,Xdata2=Xdata2,ncancorr=ncancorr,CovStructure="Iden",
                isParallel=FALSE,ncores=NULL,nfolds=5,ngrid=10,
                standardize=TRUE,thresh=0.0001,maxiteration=20)

#check output
train.correlation=mycv$maxcorr

optTau=mycv$optTau

hatalpha=mycv$hatalpha

hatbeta=mycv$hatbeta

#obtain correlation plot using training data
scoresX1=Xdata1%*% hatalpha
scoresX2=Xdata2%*% hatbeta
plot(scoresX1, scoresX2,lwd=3,
       ,xlab=paste(
         "First Canonical correlation variate for dataset", 1),
         ylab=paste("First Canonical correlation variate for dataset", 2),
         main=paste("Correlation plot for datasets",1, "and" ,2, ",", "\u03C1 =", mycv$maxcorr))


#obtain correlation plot using testing data

Xtestdata1=DataExample[[3]]
Xtestdata2=DataExample[[4]]
scoresX1=Xtestdata1%*%hatalpha
scoresX2=Xtestdata2%*%hatbeta
mytestcorr=round(abs(cor(Xtestdata1%*%hatalpha,Xtestdata2%*%hatbeta)),3)

plot(scoresX1, scoresX2,lwd=3,xlab=paste(
         "First Canonical correlation variate for dataset", 1),
         ylab=paste("First Canonical correlation variate for dataset", 2),
         main=paste("Correlation plot for datasets",1, "and" ,2, ",", "\u03C1 =", mytestcorr))

lasandrall/SELPCCA documentation built on June 8, 2020, 12:38 a.m.