KEGG_module: KEGG local concordant/discordant module detection algorithm:...

View source: R/KEGG_module.R

KEGG_moduleR Documentation

KEGG local concordant/discordant module detection algorithm: local module memberships and corresponding p-values at different module sizes

Description

The KEGG_module is a function to identify concordant or discordant subnetworks in KEGG pathways based on topological regulatory information by generating local module memberships and corresponding p-values at different module sizes.

Usage

KEGG_module(
  mcmc.merge.list,
  dataset.names,
  KEGGspecies = "hsa",
  KEGGpathwayID,
  KEGG.dataGisTopologyG = FALSE,
  KEGG.dataG2topologyG = NULL,
  data.pair,
  gene_type = c("discordant", "concordant"),
  DE_PM_cut = 0.2,
  minM = 4,
  maxM = NULL,
  B = 1000,
  cores = 1,
  search_method = c("Exhaustive", "SA"),
  reps_eachM = 1,
  topG_from_previous = 1,
  Tm0 = 10,
  mu = 0.95,
  epsilon = 1e-05,
  N = 3000,
  Elbow_plot = T,
  filePath = getwd(),
  seed = 12345,
  sep = "-"
)

Arguments

mcmc.merge.list:

a list of merged MCMC output matrices.

dataset.names:

a vector of dataset names matched with the mcmc.merge.list.

KEGGspecies:

the KEGG species abbreviation. Default is "hsa".

KEGGpathwayID:

a KEGG pathway ID, not including the organism prefix.

KEGG.dataGisTopologyG:

whether gene names in data are same as entries on KEGG topology. If TRUE, search topology nodes/entries by data gene names directly. Default is FALSE.

KEGG.dataG2topologyG:

a data frame which maps gene names in mcmc.merge.list (first column) to entries on KEGG topology (second column). If NULL & KEGG.dataGisEntrezID=F & KEGGspecies is "hsa", "mmu" or "rno", gene symbols will be automatically mapped to EntrezID by Bioconductor packages "org.Hs.eg.db", "org.Mm.eg.db", "org.Rn.eg.db". If NULL & KEGG.dataGisEntrezID=F & KEGGspecies="cel", gene symbols will be automatically mapped to WormBase sequence name by Bioconductor package "biomaRt" with prefix "CELE_" added to match entry names in KEGG topology for Caenorhabditis elegans. If NULL & KEGG.dataGisEntrezID=F & KEGGspecies="dme", gene symbols will be automatically mapped to EntrezID and then FlyBase CG IDs by Bioconductor package "org.Dm.eg.db" with prefix "Dmel_" added to match entry names in KEGG topology for Drosophila melanogaster.

data.pair:

a character vector of two study names.

gene_type:

the type of module of interests. This should be one of "concordant" or "discordant".

DE_PM_cut:

only concordant/discordant genes with posterior mean of DE indicators above this value will be considered when searching for modules.

minM:

the miminum module size to consider during searching.

maxM:

the maximum module size to consider during searching. If NULL, maximum module size will be the number of all concordant/discordant genes.

B:

the number of permutations.

cores:

the number of cores to use in permutation (mc.cores parameter in 'mclapply' function).

search_method:

the method used to search modules with small average shortest path. This should be one of "Exhaustive" or "SA" (Simulated-Annealing).

reps_eachM:

the number of searching repetitions at each module size when SA is selected.

topG_from_previous:

the number of top module results stored as initials for next module size when SA is selected.

Tm0:

SA parameter - the initial temparature.

mu:

SA parameter - the temparature multiplier.

epsilon:

SA parameter - the final temparature.

N:

SA parameter - the number of maximum annealing times.

Elbow_plot:

a logical value indicating if an elbow plot of -log10(p-value) at each module size will be saved.

filePath:

the path to save the elbow plot. Default is the current working directory.

seed:

permutation seed.

Value

A list containing 5 elements.

  • minG.ls: contains the following information for each module size from minM to maxM. minG has genes in the module whose average shortest path is optimized. p.mean, p.sd and sp are p-values, corresponding standard deviation (sd) and the average shortest path respectively. null.sp.mean and null.sp.median are from permutated null distribution. If SAis selected, the top topG_from_previous results at each module size is stored in top.G.

  • bestSize: minG.ls results for the largest module size within 2 sd of the smallest p-value.

  • mergePMmat: a merged posterior DE mean matrix based on topology nodes (one node can have multiple genes).

  • KEGGspecies: the KEGG species abbreviation.

  • KEGGpathwayID: the KEGG pathway ID

  • data.pair: the two study names.

  • module.type: discordant or concordant modules.

In addition, the elbow plot of -log10(p-value) for each module size will be saved to the filePath.

Examples

## Not run: 
#mcmc.merge.list from the merge step (see the example in function 'merge')
dataset.names = c("hb","hs","ht","ha","hi","hl",
                  "mb","ms","mt","ma","mi","ml")
res_hsa04670 = KEGG_module(mcmc.merge.list, dataset.names,KEGGspecies="hsa",
                           KEGGpathwayID="04670",data.pair = c("hs","ms"),
                           gene_type = c("discordant"),
                           DE_PM_cut = 0.2, minM = 4,maxM = NULL,
                           B = 1000, cores = 1,
                           search_method = c("Exhaustive"),
                           Elbow_plot = T, filePath = getwd())

## End(Not run)

CAMO-R/Rpackage documentation built on July 20, 2023, 6:04 a.m.