View source: R/SYB_wrapPCAgoprom.R
Wrapper function for PCA routines from pcaGoPromoter
package incl. PCAplots and
enrichment analysis of PC loadings.
1 2 3 4 5 6 7  wrapPCAgoprom(expca, groupsoi = NULL, groupby = "Sample_Group",
sample.name.column = "Sample_Name", samples2exclude = NULL,
projectfolder = file.path("pcaGoPromoter"), projectname = NULL,
figure.res = 300, inputType = "geneSymbol", print.sample.names = TRUE,
print.symbol.colors = TRUE, org = "Hs",
annotation.packages = c("pcaGoPromoter.Hs.hg19", "org.Hs.eg.db"),
PCs4table = 2, PCs2plot = c(1, 2, 3), probes2enrich = 0.025)

expca 

groupsoi 
character vector with sample groups of interest to be included in PCA (if 
groupby 
character with column name of phenoData of 
sample.name.column 
Character with column name of phenoData of 
samples2exclude 
Character vector for optionally exclusion of individual samples. Used as
regular expression for lookup of samples. 
projectfolder 
character with directory for output files (will be generated if not exisiting). 
projectname 
optional character prefix for output file names. 
figure.res 
numeric resolution for png. 
inputType 
Character vector with description of the input type. Must be Affymetrix chip type, "geneSymbol" or "entrezID". 
print.sample.names 
boolean indicating whether sample names shall be plotted in PCA plots (for pcainfoplot they are plotted anyway). 
print.symbol.colors 
boolean indicating whether the symbols should be plotted with colors. 
org 
a character vector specifying the organism. Either "Hs" (homo sapiens), "Mm" (mus musculus) or "Rn" (rattus norwegicus). 
annotation.packages 
character with bioconductor annotation packages to load. E.g. c("pcaGoPromoter.Hs.hg19", "org.Hs.eg.db") for human or c("pcaGoPromoter.Mm.mm9", "org.Mm.eg.db") for mouse. 
PCs4table 
numeric or numeric vector. Indicates number of PCs (numeric) or distinct PCs (numeric vector) for which result tables of enriched transcription factor binding sites and GOterms are calculated. 
PCs2plot 
numeric or numeric vector. Indicates number of PCs (numeric) or distinct PCs (numeric vector) to use in 2dim and 3dim PCA plots. For 2dim PCA plots all possible pairs of PCs are plotted. Additionally, a 3D plot is generated with the first 3 PCs in PCs2plot. Note that pca informative plot (containing TFBS and GO annotation on the axes) is restricted to first two PCs only! 
probes2enrich 
numeric. Number of top PCassociated probes to look for enriched TFBS and GO terms.
A value 
The pcaGoPromoter::pca function uses prcomp to do the principal component analysis.
The input data is scaled and centered, so constant variables (sd = 0) will be removed to avoid divison by zero.
2dim and 3dim PCA plots are generated for desired samples in the given ExpressionSet expca
.
Tables of PCassociated probes and transcription factor binding sites and GO terms enriched in top correlated probes
are generated for any number of principal components in positive and negative orientation.
All output data is stored in supplied projectfolder
.
Several plots and files are generated as sideeffects and stored are in the designated projectfolder. The returned value is a list of 4 objects.
PCA: Principal component matrix
loadsperPC: Top associated probes for every PC in pos and neg direction
TFtables: dataframes containing enriched TFBS for every PC in pos and neg direction (over and underrepresented)
GOtreeOutput: dataframes containing enriched GO terms for every PC in pos and neg direction
Frank Ruehle
