HTSanalyzeR2Pipe: An analysis pipeline for common phenotype data

Description Usage Arguments Value Examples

View source: R/HTSanalyzeR2Pipe.R

Description

This function performs a complete analyses of common phenotype data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
HTSanalyzeR2Pipe(
  data4enrich,
  hits = character(),
  doGSOA = FALSE,
  doGSEA = TRUE,
  listOfGeneSetCollections,
  species = "Hs",
  initialIDs = "SYMBOL",
  keepMultipleMappings = TRUE,
  duplicateRemoverMethod = "max",
  orderAbsValue = FALSE,
  pValueCutoff = 0.05,
  pAdjustMethod = "BH",
  nPermutations = 1000,
  cores = 1,
  minGeneSetSize = 15,
  exponent = 1,
  verbose = TRUE,
  GSEA.by = "HTSanalyzeR2",
  keggGSCs = NULL,
  goGSCs = NULL,
  msigdbGSCs = NULL,
  doNWA = FALSE,
  nwaPvalues = NULL,
  interactionMatrix = NULL,
  reportDir = "HTSanalyzerReport",
  nwAnalysisGenetic = FALSE,
  nwAnalysisFdr = 0.001
)

Arguments

data4enrich

A numeric or integer vector of phenotypes named by gene identifiers.

hits

A character vector of the gene identifiers (used as hits in the hypergeometric tests). It's needed if you want to do GSOA (gene set overrepresentation analysis).

doGSOA

A logic value specifying whether to do hypergeometric test or not, default is FALSE.

doGSEA

A logic value specifying whether to do gene set enrichment analysis or not, default is TRUE.

listOfGeneSetCollections

A list of gene set collections (a 'gene set collection' is a list of gene sets).

species

A single character value specifying the species for which the data should be read.

initialIDs

A single character value specifying the type of initial identifiers for input geneList

keepMultipleMappings

A single logical value. If TRUE, the function keeps the entries with multiple mappings (first mapping is kept). If FALSE, the entries with multiple mappings will be discarded.

duplicateRemoverMethod

A single character value specifying the method to remove the duplicates. See duplicateRemover for details.

orderAbsValue

A single logical value indicating whether the values should be converted to absolute values and then ordered (if TRUE), or ordered as they are (if FALSE).

pValueCutoff

A single numeric value specifying the cutoff for p-values considered significant in gene set collection analysis.

pAdjustMethod

A single character value specifying the p-value adjustment method to be used (see 'p.adjust' for details) in gene set collection analysis.

nPermutations

A single integer or numeric value specifying the number of permutations for deriving p-values in GSEA.

cores

A single integer or numeric value specifying the number of cores to be used for GSEA.

minGeneSetSize

A single integer or numeric value specifying the minimum number of elements shared by a gene set and the input total genes. Gene sets with fewer than this number are removed from both hypergeometric analysis and GSEA.

exponent

A single integer or numeric value used in weighting phenotypes in GSEA.

verbose

A single logical value specifying to display detailed messages (when verbose=TRUE) or not (when verbose=FALSE)

GSEA.by

A single character value to choose which algorithm to do GSEA. Valid value could either be "HTSanalyzeR2"(default) or "fgsea". If performed by "fgsea", the result explanation please refer to fgsea.

keggGSCs

A character vector of names of all KEGG gene set collections.

goGSCs

A character vector of names of all GO gene set collections.

msigdbGSCs

A character vector of names of all MSigDB gene set collections.

doNWA

A logic value specifying whether to do subnetwork analysis or not, default is FALSE.

nwaPvalues

A single numeric value specifying the false discovery for the scoring of nodes in NWA analysis (see BioNet::scoreNodes and Dittrich et al., 2008 for details)

interactionMatrix

An interaction matrix including columns 'InteractionType', 'InteractorA' and 'InteractorB'. If this matrix is available, the interactome can be directly built based on it.

reportDir

A single character value specifying the directory to store reports. For default the enrichment analysis reports will be stored in the directory called "HTSanalyzerReport".

nwAnalysisGenetic

A single logical value. If TRUE, genetic interactions will be kept; otherwise, they will be removed from the data set.

nwAnalysisFdr

A single numeric value specifying the false discovery for the scoring of nodes (see BioNet::scoreNodes and Dittrich et al., 2008 for details)

Value

This pipeline function will finally return a list of GSCA object and NWA object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
## Not run: 
library(GO.db)
library(org.Hs.eg.db)
library(KEGGREST)

data(d7)
## define data4enrich
data4enrich <- as.vector(d7$neg.lfc)
names(data4enrich) <- d7$id

## select hits if you also want to do GSOA, otherwise ignore it
hits <-  names(data4enrich[which(abs(data4enrich) > 2)])


## set up a list of gene set collections
GO_MF <- GOGeneSets(species="Hs", ontologies=c("MF"))
PW_KEGG <- KeggGeneSets(species="Hs")
ListGSC <- list(GO_MF=GO_MF, PW_KEGG=PW_KEGG)

## start analysis
rslt <- HTSanalyzeR2Pipe(data4enrich = data4enrich,
                         hits = hits,
                         doGSOA = TRUE,
                         doGSEA = TRUE,
                         listOfGeneSetCollections = ListGSC,
                         species = "Hs",
                         initialIDs = "SYMBOL",
                         pValueCutoff = 0.05,
                         nPermutations = 1000,
                         cores = 2,
                         minGeneSetSize = 100,
                         keggGSCs=c("PW_KEGG"),
                         goGSCs = c("GO_MF"),
                         doNWA = FALSE)

report(rslt$gsca)
## End(Not run)

CityUHK-CompBio/HTSanalyzeR2 documentation built on Dec. 3, 2020, 2:35 a.m.