wrapPCAgoprom: Principal component analysis for gene expression data

Description Usage Arguments Details Value Author(s)

View source: R/SYB_wrapPCAgoprom.R

Description

Wrapper function for PCA routines from pcaGoPromoter-package incl. PCA-plots and enrichment analysis of PC loadings.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
wrapPCAgoprom(
  expca,
  groupsoi = NULL,
  groupby = "Sample_Group",
  sample.name.column = "Sample_Name",
  samples2exclude = NULL,
  projectfolder = file.path("pcaGoPromoter"),
  projectname = NULL,
  figure.res = 300,
  inputType = "geneSymbol",
  print.sample.names = TRUE,
  print.symbol.colors = TRUE,
  org = "Hs",
  annotation.packages = c("pcaGoPromoter.Hs.hg19", "org.Hs.eg.db"),
  PCs4table = 2,
  PCs2plot = c(1, 2, 3),
  probes2enrich = 0.025
)

Arguments

expca

ExpressionSet object or a table with expression data with variables (probes) in rows and observations in columns (samples). In latter case, rows of data matrix must be named after probe identifiers selected in inputType.

groupsoi

character vector with sample groups of interest to be included in PCA (if expca is an ExpressionSet). Respective samples are taken from phenoData of expca. groupnames must match entries in column given in groupby.

groupby

character with column name of phenoData of expca used for group names if expca is an ExpressionSet. Otherwise, groupby must be vector of group assignments in the same order as samples in the data matrix.

sample.name.column

Character with column name of phenoData of expca used for sample names

samples2exclude

Character vector for optionally exclusion of individual samples. Used as regular expression for lookup of samples. Null if no sample to exlude.

projectfolder

character with directory for output files (will be generated if not exisiting).

projectname

optional character prefix for output file names.

figure.res

numeric resolution for png.

inputType

Character vector with description of the input type. Must be Affymetrix chip type, "geneSymbol" or "entrezID".

print.sample.names

boolean indicating whether sample names shall be plotted in PCA plots (for pcainfoplot they are plotted anyway).

print.symbol.colors

boolean indicating whether the symbols should be plotted with colors.

org

a character vector specifying the organism. Either "Hs" (homo sapiens), "Mm" (mus musculus) or "Rn" (rattus norwegicus).

annotation.packages

character with bioconductor annotation packages to load. E.g. c("pcaGoPromoter.Hs.hg19", "org.Hs.eg.db") for human or c("pcaGoPromoter.Mm.mm9", "org.Mm.eg.db") for mouse.

PCs4table

numeric or numeric vector. Indicates number of PCs (numeric) or distinct PCs (numeric vector) for which result tables of enriched transcription factor binding sites and GO-terms are calculated.

PCs2plot

numeric or numeric vector. Indicates number of PCs (numeric) or distinct PCs (numeric vector) to use in 2-dim and 3-dim PCA plots. For 2-dim PCA plots all possible pairs of PCs are plotted. Additionally, a 3D plot is generated with the first 3 PCs in PCs2plot. Note that pca informative plot (containing TFBS and GO annotation on the axes) is restricted to first two PCs only!

probes2enrich

numeric. Number of top PC-associated probes to look for enriched TFBS and GO terms. A value <= 1 is interpreted as fraction of total number of probes.

Details

The pcaGoPromoter::pca function uses prcomp to do the principal component analysis. The input data is scaled and centered, so constant variables (sd = 0) will be removed to avoid divison by zero. 2-dim and 3-dim PCA plots are generated for desired samples in the given ExpressionSet expca. Tables of PC-associated probes and transcription factor binding sites and GO terms enriched in top correlated probes are generated for any number of principal components in positive and negative orientation. All output data is stored in supplied projectfolder.

Value

Several plots and files are generated as side-effects and stored are in the designated projectfolder. The returned value is a list of 4 objects.

Author(s)

Frank Ruehle


frankRuehle/systemsbio documentation built on Sept. 14, 2020, 1:18 a.m.