wrapPCAgoprom: Principal component analysis for gene expression data
In frankRuehle/systemsbio: Streamlined Analysis and Integration of Systems Biology Data

Description Usage Arguments Details Value Author(s)

Wrapper function for PCA routines from pcaGoPromoter-package incl. PCA-plots and enrichment analysis of PC loadings.

wrapPCAgoprom(
  expca,
  groupsoi = NULL,
  groupby = "Sample_Group",
  sample.name.column = "Sample_Name",
  samples2exclude = NULL,
  projectfolder = file.path("pcaGoPromoter"),
  projectname = NULL,
  figure.res = 300,
  inputType = "geneSymbol",
  print.sample.names = TRUE,
  print.symbol.colors = TRUE,
  org = "Hs",
  annotation.packages = c("pcaGoPromoter.Hs.hg19", "org.Hs.eg.db"),
  PCs4table = 2,
  PCs2plot = c(1, 2, 3),
  probes2enrich = 0.025
)

`expca`	`ExpressionSet` object or a table with expression data with variables (probes) in rows and observations in columns (samples). In latter case, rows of data matrix must be named after probe identifiers selected in `inputType`.
`groupsoi`	character vector with sample groups of interest to be included in PCA (if `expca` is an `ExpressionSet`). Respective samples are taken from `phenoData` of `expca`. groupnames must match entries in column given in `groupby`.
`groupby`	character with column name of phenoData of `expca` used for group names if `expca` is an `ExpressionSet`. Otherwise, `groupby` must be vector of group assignments in the same order as samples in the data matrix.
`sample.name.column`	Character with column name of phenoData of `expca` used for sample names
`samples2exclude`	Character vector for optionally exclusion of individual samples. Used as regular expression for lookup of samples. `Null` if no sample to exlude.
`projectfolder`	character with directory for output files (will be generated if not exisiting).
`projectname`	optional character prefix for output file names.
`figure.res`	numeric resolution for png.
`inputType`	Character vector with description of the input type. Must be Affymetrix chip type, "geneSymbol" or "entrezID".
`print.sample.names`	boolean indicating whether sample names shall be plotted in PCA plots (for pcainfoplot they are plotted anyway).
`print.symbol.colors`	boolean indicating whether the symbols should be plotted with colors.
`org`	a character vector specifying the organism. Either "Hs" (homo sapiens), "Mm" (mus musculus) or "Rn" (rattus norwegicus).
`annotation.packages`	character with bioconductor annotation packages to load. E.g. c("pcaGoPromoter.Hs.hg19", "org.Hs.eg.db") for human or c("pcaGoPromoter.Mm.mm9", "org.Mm.eg.db") for mouse.
`PCs4table`	numeric or numeric vector. Indicates number of PCs (numeric) or distinct PCs (numeric vector) for which result tables of enriched transcription factor binding sites and GO-terms are calculated.
`PCs2plot`	numeric or numeric vector. Indicates number of PCs (numeric) or distinct PCs (numeric vector) to use in 2-dim and 3-dim PCA plots. For 2-dim PCA plots all possible pairs of PCs are plotted. Additionally, a 3D plot is generated with the first 3 PCs in PCs2plot. Note that pca informative plot (containing TFBS and GO annotation on the axes) is restricted to first two PCs only!
`probes2enrich`	numeric. Number of top PC-associated probes to look for enriched TFBS and GO terms. A value `<= 1` is interpreted as fraction of total number of probes.

The pcaGoPromoter::pca function uses prcomp to do the principal component analysis. The input data is scaled and centered, so constant variables (sd = 0) will be removed to avoid divison by zero. 2-dim and 3-dim PCA plots are generated for desired samples in the given ExpressionSet expca. Tables of PC-associated probes and transcription factor binding sites and GO terms enriched in top correlated probes are generated for any number of principal components in positive and negative orientation. All output data is stored in supplied projectfolder.

Several plots and files are generated as side-effects and stored are in the designated projectfolder. The returned value is a list of 4 objects.

PCA: Principal component matrix
loadsperPC: Top associated probes for every PC in pos and neg direction
TFtables: dataframes containing enriched TFBS for every PC in pos and neg direction (over- and underrepresented)
GOtreeOutput: dataframes containing enriched GO terms for every PC in pos and neg direction

Frank Ruehle

frankRuehle/systemsbio documentation built on Sept. 14, 2020, 1:18 a.m.

frankRuehle/systemsbio index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

frankRuehle/systemsbio
Streamlined Analysis and Integration of Systems Biology Data

wrapPCAgoprom: Principal component analysis for gene expression data
In frankRuehle/systemsbio: Streamlined Analysis and Integration of Systems Biology Data

Description

Usage

Arguments

Details

Value

Author(s)

Related to wrapPCAgoprom in frankRuehle/systemsbio...

R Package Documentation

Browse R Packages

We want your feedback!

frankRuehle/systemsbio Streamlined Analysis and Integration of Systems Biology Data

wrapPCAgoprom: Principal component analysis for gene expression data In frankRuehle/systemsbio: Streamlined Analysis and Integration of Systems Biology Data

Description

Usage

Arguments

Details

Value

Author(s)

Related to wrapPCAgoprom in frankRuehle/systemsbio...

R Package Documentation

Browse R Packages

We want your feedback!

frankRuehle/systemsbio
Streamlined Analysis and Integration of Systems Biology Data

wrapPCAgoprom: Principal component analysis for gene expression data
In frankRuehle/systemsbio: Streamlined Analysis and Integration of Systems Biology Data