ebrowser: Seamless navigation through enrichment analysis results
In EnrichmentBrowser: Seamless navigation through combined results of set-based and network-based enrichment analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

This is the all-in-one wrapper function to perform the standard enrichment analysis pipeline implemented in the EnrichmentBrowser package.

ebrowser(
  meth,
  exprs,
  cdat,
  rdat,
  org,
  data.type = c(NA, "ma", "rseq"),
  norm.method = "quantile",
  de.method = "limma",
  gs,
  grn = NULL,
  perm = 1000,
  alpha = 0.05,
  beta = 1,
  comb = FALSE,
  browse = TRUE,
  nr.show = -1,
  out.dir = NULL,
  report.name = "index",
  ...
)

`meth`	Enrichment analysis method(s). See `sbeaMethods` and `nbeaMethods` for currently supported enrichment analysis methods. See also `sbea` and `nbea` for details.
`exprs`	Expression matrix. A tab separated text file containing the expression values (microarray: intensity measurements, RNA-seq: read counts). Columns = samples/subjects; rows = features/probes/genes; NO headers, row or column names. Alternatively, this can be a `SummarizedExperiment`, assuming the expression matrix in the `assays` slot. See details.
`cdat`	Column (phenotype) data. A tab separated text file containing annotation information for the samples in either two or three columns. NO headers, row or column names. The number of rows/samples in this file should match the number of columns/samples of the expression matrix. The 1st column is reserved for the sample IDs; The 2nd column is reserved for a BINARY group assignment. Use '0' and '1' for unaffected (controls) and affected (cases) sample class, respectively. For paired samples or sample blocks a third column is expected that defines the blocks. If 'exprs' is a `SummarizedExperiment`, the 'cdat' argument can be left unspecified, which then expects group and optional block assignments in respectively named columns 'GROUP' (mandatory) and 'BLOCK' (optional) in the `colData` slot.
`rdat`	Row (feature) data. A tab separated text file containing annotation information for the features. In case of probe level data: exactly TWO columns; 1st col = probe/feature IDs; 2nd col = corresponding gene ID for each feature ID in 1st col. In case of gene level data: the gene IDs newline-separated (i.e. just one column). It is recommended to use ENTREZ gene IDs (to benefit from downstream visualization and exploration functionality of the EnrichmentBrowser). NO headers, row or column names. The number of rows (features/probes/genes) in this file should match the number of rows/features of the expression matrix. Alternatively, this can also be the ID of a recognized platform such as 'hgu95av2' (Affymetrix Human Genome U95 chip) or 'ecoli2' (Affymetrix E. coli Genome 2.0 Array). If 'exprs' is a `SummarizedExperiment`, the 'rdat' argument can be left unspecified, which then expects probe and corresponding Entrez Gene IDs in respectively named columns 'PROBEID' and 'ENTREZID' in the `rowData` slot.
`org`	Organism under investigation in KEGG three letter code, e.g. ‘hsa’ for ‘Homo sapiens’. See also `kegg.species.code` to convert your organism of choice to KEGG three letter code.
`data.type`	Expression data type. Use 'ma' for microarray and 'rseq' for RNA-seq data. If NA, data.type is automatically guessed. If the expression values in 'exprs' are decimal numbers they are assumed to be microarray intensities. Whole numbers are assumed to be RNA-seq read counts. Defaults to NA.
`norm.method`	Determines whether and how the expression data should be normalized. For available microarray normalization methods see the man page of the limma function `normalizeBetweenArrays`. For available RNA-seq normalization methods see the man page of the EDASeq function `betweenLaneNormalization`. Defaults to 'quantile', i.e. normalization is carried out so that quantiles between arrays/lanes/samples are equal. Use 'none' to indicate that the data is already normalized and should not be normalized by ebrowser. See the man page of `normalize` for details.
`de.method`	Determines which method is used for per-gene differential expression analysis. See the man page of `deAna` for details. Defaults to 'limma', i.e. differential expression is calculated based on the typical limma `lmFit` procedure. This can also be 'none' to indicate that DE analysis has already been carried out and should not be overwritten by `ebrowser` (applies only when `exprs` is given as a `SummarizedExperiment`).
`gs`	Gene sets. Either a list of gene sets (character vectors of gene IDs) or a text file in GMT format storing all gene sets under investigation.
`grn`	Gene regulatory network. Either an absolute file path to a tabular file or a character matrix with exactly THREE cols; 1st col = IDs of regulating genes; 2nd col = corresponding regulated genes; 3rd col = regulation effect; Use '+' and '-' for activation/inhibition.
`perm`	Number of permutations of the sample group assignments. Defaults to 1000. Can also be an integer vector matching the length of 'meth' to assign different numbers of permutations for different methods.
`alpha`	Statistical significance level. Defaults to 0.05.
`beta`	Log2 fold change significance level. Defaults to 1 (2-fold).
`comb`	Logical. Should results be combined if more then one enrichment method is selected? Defaults to FALSE.
`browse`	Logical. Should results be displayed in the browser for interactive exploration? Defaults to TRUE.
`nr.show`	Number of gene sets to show. As default all statistical significant gene sets are displayed. Note that this only influences the number of gene sets for which additional visualization will be provided (typically only of interest for the top / signifcant gene sets). Selected enrichment methods and resulting flat gene set rankings still include the complete number of gene sets under study.
`out.dir`	Output directory. If `NULL`, defaults to a timestamp-generated subdirectory of `configEBrowser("OUTDIR.DEFAULT")`.
`report.name`	Character. Name of the HTML report. Defaults to `"index"`.
`...`	Additional arguments passed on to the individual building blocks.

Given flat gene expression data, the data is read in and subsequently subjected to chosen enrichment analysis methods.

The results from different methods can be combined and investigated in detail in the default browser.

*On data type and normalization:*

Normalization of high-throughput expression data is essential to make results within and between experiments comparable. Microarray (intensity measurements) and RNA-seq (read counts) data exhibit typically distinct features that need to be normalized for. This function wraps commonly used functionality from limma for microarray normalization and from EDASeq for RNA-seq normalization. For specific needs that deviate from standard normalizations, the user should always refer to more specific functions/packages. See also the limma's user guide http://www.bioconductor.org/packages/limma for definition and normalization of the different expression data types.

Microarray data is expected to be single-channel. For two-color arrays, it is expected here that normalization within arrays has been already carried out, e.g. using normalizeWithinArrays from limma.

RNA-seq data is expected to be raw read counts. Please note that normalization for downstream DE analysis, e.g. with edgeR and DESeq2, is not ultimately necessary (and in some cases even discouraged) as many of these tools implement specific normalization approaches. See the vignette of EDASeq, edgeR, and DESeq2 for details.

None, writes an HTML report and, if selected, opens the browser to explore results. If not instructed otherwise (via argument out.dir), the main HTML report and associated files are written to configEBrowser("OUTDIR.DEFAULT"). See ?configEBrowser to change the location. If browse=TRUE, the HTML report will automatically be opened in the default browser.

Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>

Limma User's guide: http://www.bioconductor.org/packages/limma

readSE to read expression data from file; probe2gene to transform probe to gene level expression; kegg.species.code maps species name to KEGG code. getGenesets to retrieve gene set databases such as GO or KEGG; compileGRN to construct a GRN from pathway databases; sbea to perform set-based enrichment analysis; nbea to perform network-based enrichment analysis; combResults to combine results from different methods; eaBrowse for exploration of resulting gene sets

    # expression data from file
    exprs.file <- system.file("extdata/exprs.tab", package="EnrichmentBrowser")
    cdat.file <- system.file("extdata/colData.tab", package="EnrichmentBrowser")
    rdat.file <- system.file("extdata/rowData.tab", package="EnrichmentBrowser")
    
    # getting all human KEGG gene sets
    # hsa.gs <- getGenesets(org="hsa", db="kegg")
    gs.file <- system.file("extdata/hsa_kegg_gs.gmt", package="EnrichmentBrowser")
    hsa.gs <- getGenesets(gs.file)

    # output destination 
    out.dir <- configEBrowser("OUTDIR.DEFAULT") 

    # set-based enrichment analysis
    ebrowser( meth="ora", perm=0,
            exprs=exprs.file, cdat=cdat.file, rdat=rdat.file, 
            gs=hsa.gs, org="hsa", nr.show=3,
            out.dir=out.dir, report.name="oraReport")

    # compile a gene regulatory network from KEGG pathways
    hsa.grn <- compileGRN(org="hsa", db="kegg")
   
    # network-based enrichment analysis
    ebrowser(   meth="ggea", 
            exprs=exprs.file, cdat=cdat.file, rdat=rdat.file, 
            gs=hsa.gs, grn=hsa.grn, org="hsa", nr.show=3,
            out.dir=out.dir, report.name="ggeaReport")

    # combining results
    ebrowser( meth=c("ora", "ggea"), perm=0, comb=TRUE,
            exprs=exprs.file, cdat=cdat.file, rdat=rdat.file, 
            gs=hsa.gs, grn=hsa.grn, org="hsa", nr.show=3,
            out.dir=out.dir, report.name="combReport")