cor.matrix: correlation calculation for a set of genes

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/rsgcc.R

Description

This function provides five correlation methods (GCC, PCC, SCC, KCC and BiWt) to calculates the correlations between a set of genes.

Usage

1
2
3
4
5
6
7
8
9
cor.matrix(GEMatrix,
      cpus = 1,
      cormethod = c("GCC", "PCC", "SCC", "KCC", "BiWt"),
      style = c("all.pairs", "pairs.between", "adjacent.pairs", "one.pair"),
      var1.id = NA,
      var2.id = NA,
      pernum = 0,
      sigmethod = c("two.sided", "one.sided"),
      output = c("matrix", "paired"))

Arguments

GEMatrix

a data matrix containing the gene expression data of a set of genes. Each row of the GEMatrix corresponds to a gene, and each column corresponds to the expression level in a sample.

cpus

the number of cpus used for correlation calcluation.

cormethod

a character string that specifies a correlation method to be used for correlation calculation.

style

a character string that indicates the all or partial genes to be used for correlation calculation.

var1.id

a numeric vector specifying the row numbers of genes.

var2.id

a numeric vector specifying the row numbers of genes. Suppose the var1.id and var2.id are respectively c(1,2) and c(3,6), then the the correlation of gene pairs (G1,G3) and (G2,G6) will be calcuated. For styles of "pairs.between" and "one.pair", this parameter MUST be pre-defined. For the other styles, this parameter can be automatically defined by the program itself.

pernum

the number of permutation test used for calcluating statistical significance level (i.e., p-value) of correlations.

sigmethod

a character string ("two-sided" or "one-sided") that specifies the method used to compute p-value for permutation test.

output

a character string ("matrix" or "paired") that represents the output format of correlations. Specifiying the "matrix" will output two matrix for correlations and p-values, respectively. Specifiying the "paired" will output only one matrix, in which each row provides the information of gene pair, the correlation and p-value.

Details

Given a data matrix (e.g., microarray and RNA-Seq gene expression matrix), calculating correlation with GCC and other correlation methods for partial(or all) individuals (e.g., genes). The statistical significance (i.e., p-value) of each correlation is derived from the permutation test. Parallel computing options are also provided for speeding up the correlation calculation.

Value

A list with the following components:

corMatrix

correlation of gene pairs shown in matrix form. This data matrix is generated only when the output format "matrix" is specified.

pvalueMatrix

p-value of correlations shown in matrix form. This data matrix is generated only when the output format "matrix" is specified.

corpvalueMatrix

correlation and p-values listed in one form. This data matrix is generated only when the output format "paired" is indicated.

Note

(1) The rsgcc provides the RNA-Seq profiled expression level of 100 genes as a sample data to implement cor.matrix, cor.pari and other functions in the package. After running the command: data(rsgcc), the expression data of these genes will be loaded to the GEMatrix "rnaseq". The user can also load the GEMatrix from the gene expression file, which should be in a textual format of a gene expression matrix. An example of the gene expression file(e.g., "/home/rsgcc/geneExpFile.txt") is shown as follow:

sample1 sample2 sample3 sample4

gene1 45 65 77 75

gene2 75 78 83 39

gene3 2 11 10 6

Then the GEMatrix can be obtained by load this gene expression file with the command: x <- as.matrix(read.table("/home/rsgcc/geneExpFile.txt"))

(1) var1.id and var2.id should be defined with the numeric vector format for "pairs.between", or "one.pair" styles.

(2) To perform BiWt, the R package "biwt" should be installed in advance.

(3) To perform the parallel compution, the "snowfall" package in R should be installed in advance.

Author(s)

Chuang Ma, Xiangfeng Wang

See Also

cor.pair, onegcc, cor.test.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## Not run: 
   data(rsgcc)        #load the sample data in rsgcc package
   x <- rnaseq[1:4,]  #construct a GEMatrix with the RNA-Seq data of the first four genes

   #run on one CPU for all the possible gene pairs in the GEMatrix "x".
   #do not cacluate the p-value of computated correlations.
   cor.matrix(x, cpus = 1, 
              cormethod = "GCC", style = "all.pairs", 
              pernum = 0, sigmethod = "two.sided", 
              output = "matrix")  

   #run on two CPUs, snowfall package should be properly installed.
   #cacluate the p-value of correlations with the 2000 permutation tests.
   #output the results in "paired" format.
   cor.matrix(x, cpus = 2, 
              cormethod = "GCC", style = "all.pairs", 
              pernum = 2000, sigmethod = "two.sided", 
              output = "paired")  

   #calcuate correlation on the pairs between the 1st, 2nd and 3rd genes in the GEMatrix "x".
   cor.matrix(x, cpus = 1, 
              cormethod = "GCC", style = "pairs.between", 
              var1.id = c(1:3), var2.id = c(1:3),
              pernum = 2000, sigmethod = "two.sided", 
              output = "matrix")

  #calcuate correlation on the adjacent genes ((G1,G2), (G2,G3), (G3,G4),...) in the GEMatrix "x".
   cor.matrix(x, cpus = 1, 
              cormethod = "GCC", style = "adjacent.pairs", 
              pernum = 2000, sigmethod = "two.sided", 
              output = "matrix")


## End(Not run)

rsgcc documentation built on May 2, 2019, 9:25 a.m.