cvSIDANet: Cross validation for Sparse Integrative Discriminant Analysis...

View source: R/cvSIDANet.R

cvSIDANetR Documentation

Cross validation for Sparse Integrative Discriminant Analysis for Multi-view Structured (Network) Data

Description

Peforms nfolds cross validation to select optimal tuning parameters for sidanet based on training data, which are then used with the training or testing data to predict class membership. Allows for inclusion of covariates which are not penalized. If you want to apply optimal tuning parameters to testing data, you may also use sidanet.

Usage

cvSIDANet(Xdata=Xdata,Y=Y,myedges=myedges,myedgeweight=myedgeweight,withCov=FALSE,
          plotIt=FALSE,Xtestdata=NULL,Ytest=NULL,isParallel=TRUE,ncores=NULL,
          gridMethod='RandomSearch', AssignClassMethod='Joint', nfolds=5,ngrid=8,
          standardize=TRUE,maxiteration=20, weight=0.5,thresh=1e-03,eta=0.5)

Arguments

Xdata

A list with each entry containing training views of size n \times p_d, where d =1,...,D. Rows are samples and columns are variables. If covariates are available, they should be included as a separate view, and set as the last dataset. For binary or categorical covariates (assumes no ordering), we suggest the use of indicator variables.

Y

n \times 1 vector of class membership.

myedges

A list with each entry containing a M_d\times 2 matrix of edge information for each view. If a view has no edge information, set to 0; this will default to SIDA. If covariates are available as a view (Dth view), the edge information should be set to 0.

myedgeweight

A list with each entry containing a M_d\times 1 vector of weight information for each view. If a view has no weight information, set to 0; this will use the Laplacian of an unweighted graph. If covariates are available as a view (Dth view), the weight information should be set to 0.

withCov

TRUE or FALSE if covariates are available. If TRUE, please set all covariates as one dataset and should be the last dataset. For binary and categorical variables, use indicator matrices/vectors. Default is FALSE.

plotIt

TRUE or FALSE. If TRUE, produces discriminants and correlation plots. Default is FALSE.

Xtestdata

A list with each entry containing testing views of size ntest \times p_d, where d =1,...,D. Rows are samples and columns are variables. The order of the list should be the same as the order for the training data, Xdata. Use if you want to predict on a testing dataset. If no Xtestdata, set to NULL.

Ytest

ntest \times 1 vector of test class membership. If no testing data provided, set to NULL.

isParallel

TRUE or FALSE for parallel computing. Default is TRUE.

ncores

Number of cores to be used for parallel computing. Only used if isParallel=TRUE. If isParallel=TRUE and ncores=NULL, defaults to half the size of the number of system cores.

gridMethod

GridSearch or RandomSearch. Optimize tuning parameters over full grid or random grid. Default is RandomSearch.

AssignClassMethod

Classification method. Either Joint or Separate. Joint uses all discriminant vectors from D datasets to predict class membership. Separate predicts class membership separately for each dataset. Default is Joint

nfolds

Number of cross validation folds. Default is 5.

ngrid

Number of grid points for tuning parameters. Default is 8 for each view if D=2. If D>2, default is 5.

standardize

TRUE or FALSE. If TRUE, data will be normalized to have mean zero and variance one for each variable. Default is TRUE.

maxiteration

Maximum iteration for the algorithm if not converged. Default is 20.

weight

Balances separation and association. Default is 0.5.

thresh

Threshold for convergence. Default is 0.001.

eta

Balances the selection of network, and variables within network. Default is 0.5.

Details

The function will return several R objects, which can be assigned to a variable. To see the results, use the “$" operator.

Value

sidaerror

Estimated classication error. If testing data provided, this will be test classification error, otherwise, training error

sidacorrelation

Sum of pairwise RV coefficients. Normalized to be within 0 and 1, inclusive.

hatalpha

A list of estimated sparse discriminant vectors for each view.

PredictedClass

Predicted class. If AssignClassMethod='Separate', this will be a ntest\times D matrix, with each column the predicted class for each data.

optTau

Optimal tuning parameters for each view, not including covariates, if available.

gridValues

Grid values used for searching optimal tuning paramters.

AssignClassMethod

Classification method used. Joint or Separate.

gridMethod

Grid method used. Either GridSearch or RandomSearch

References

Sandra E. Safo, Eun Jeong Min, and Lillian Haine (2019) , Sparse Linear Discriminant Analysis for Multi-view Structured Data, submitted

See Also

sidanet,CorrelationPlots,DiscriminantPlots

Examples

library(SIDA)
##---- read in sample data
data(SIDANetDataExample)

##---- call cross validation

#example with two views having edge weights

Xdata=SIDANetDataExample[[1]]
Y=SIDANetDataExample[[2]]
Xtestdata=SIDANetDataExample[[3]]
Ytest=SIDANetDataExample[[4]]
myedges=SIDANetDataExample[[5]]
myedgeweight=SIDANetDataExample[[6]]

mycv=cvSIDANet(Xdata,Y,myedges,myedgeweight,withCov=FALSE,plotIt=FALSE,Xtestdata=Xtestdata,
          Ytest=Ytest,isParallel=TRUE,ncores=NULL,gridMethod='RandomSearch',
          AssignClassMethod='Joint',nfolds=5,ngrid=8,standardize=TRUE,
          maxiteration=20, weight=0.5,thresh=1e-03,eta=0.5)


#check output
test.error=mycv$sidaneterror

test.correlation=mycv$sidanetcorrelation

optTau=mycv$optTau

hatalpha=mycv$hatalpha

#---------Discriminant plot
mydisplot=DiscriminantPlots(Xtestdata,Ytest,mycv$hatalpha)

mycorrplot=CorrelationPlots(Xtestdata,Ytest,mycv$hatalpha)

lasandrall/SIDA documentation built on Oct. 19, 2022, 9:23 a.m.