pairwiseCorSkew: pairwiseCorSkew
In genejockey33000/typGumbo: Scripts and Pipelines for RNAseq Analysis

pairwiseCorSkew

R Documentation

pairwiseCorSkew

Description

Takes two measurement matrices (generally expression matrices) containing measurements from sets of samples to be check for congruence. For example, to compare transcriptomes from stem cell derived cell cultures to transcriptomes from the same human subjects. In this case each column is a human subject and each row a gene expression value. Column names must use the same IDs between matix 1 and 2 but need not be in the same order. Some missing samples and measurements are allowed (the script will remove them). pairwiseCorSkew returns a list with 2 components: 1) CWC is the column-wise correlation output exported as a data frame containing all measurements rvals, pvals, fdr (Benjamini-Hochberg), and family-wise error rate (FWER, Hochberg). If you use human ENSGs as measurements it will also map to gene names and chromosome. 2) the second component returned is a measurement of "skew". It uses the absolute value of the sum of the 100 largest rvals divided by the absolute value of the sum of the smallest (most negative) rvals.

Usage

pairwiseCorSkew(x, y, method = "pearson", pct = "1.0", iter = 1000)

Arguments

`x`	Matrix 1 with samples in columns and measurements (genes, transcripts, proteins, etc.), in rows. You should use the matrix from the best controlled samples for matrix 1 as it is used to identify minimally variant genes between samples. For example, if comparing differentiated neurons to human brain tissue, the differentiated neuron expression matrix should be x (matrix 1)
`y`	Matrix 2 with same (or highly overlapping) samples and measurements as matrix 1.
`method`	Indicate method of correlation to use "pearson", or rank order "spearman"
`pct`	The pct of upper relative variance (variance divided by mean), to include in calculation. Many, maybe most, measurements will have minor variance that is simply noise. Measuring correlations for measurements that are relatively constant is undesirable. Set pct = .5 to only include measurements in the upper 50 to remove genes that with noise-level variance then this can be left at 1.
`iter`	The number of iterations (permutations of x) for NULL distribution generation. If you don't want a NULL distribution analysis, set to 0