CANN: Canonical Correlation of Two Sets of Genomic Data

View source: R/CANN.R

CANNR Documentation

Canonical Correlation of Two Sets of Genomic Data

Description

Compute canonical correlation between two sets of genomic data.

Usage

CANN (geneSet, Edat, Mdat, EMlbl = c("Expr", "Methyl"), phdat) 

Arguments

geneSet

a gene set collection to annotate probes to gene

Edat

data frame of the first form of genomic data, such as gene expression data with row being probes and column being subjects. The column names should match the row names phdat

Mdat

data frame of the second form of genomic data, such as methylation data with row being probes and column being subjects. The column names should match the row names phdat

EMlbl

lablel of the genomic data, default=c("Expr", "Methyl") for Edat and Mdat

phdat

phenotype data with row being subjects and column being phenotype variables. The row names should match the column names of Edat and Mdat

Details

The function performs Canonical correlation between two forms genomic data for each gene (Edat and Mdat) defined by gann. If a gene only has one form of genomic data, the first principal component is used; If one form of data has numberof probesets exceeding the number of subjects, the first number of subjects probesets are used. The function return a list of three components. See value for details.

Value

The output of the function is a list of length 3 with thee components:

CCres

canonical correlation result: a data frame with row for each each gene and six columns (Gene: gene names; n.EMlbl[1]: number of probes of first form genomic data; n.EMlbl[2]: number of probes of second form genomic data; CanonicalCR: Canonical correlation of first components; WilksPermPval: permuatation p value of Wilks' Lambda; WilksAsymPval: p value of F-approximations of Wilks' Lambda).

FSTccscore

the first component of canonical correlation: a data frame with row for each gene, first half of columns for first component of first form genomic data and second half of columns for first component of second form genomic data.

CCload

a data frame of loading (each row is for a gene, first column is gene names, second column is the probeset ids of first form genomic data seperated by '|', third column is the load for each probeset in first form genomic data seperated by '|', fourth column is the probeset ids of second form genomic data seperated by '|', fifth column is the load for each probeset in second form genomic data seperated by '|')

Author(s)

Xueyuan Cao Xueyuan.cao@stjude.org, Stanley Pounds stanley.pounds@stjude.org

References

Hotelling H. (1936). Relations between two sets of variables. Biometrika, 28, 321-327

See Also

CCPROMISE

Examples

  ## load  exmplEdat exmplMdat
  data(exmplESet)
  data(exmplMSet)
  data(exmplGeneSet)
  ## Perform canonical correlation test
 test1<- CANN(geneSet=exmplGeneSet, 
              Edat=exprs(exmplESet), 
              Mdat=exprs(exmplMSet), 
              EMlbl=c("Expr", "Methyl"), 
              phdat=pData(exmplESet))   
 

xueyuancao/CCPROMISE documentation built on May 6, 2023, 8:29 a.m.