Rpackage: Perform cross-species resemblance analysis

KEGG_module

R Documentation

KEGG local concordant/discordant module detection algorithm: local module memberships and corresponding p-values at different module sizes

Description

The KEGG_module is a function to identify concordant or discordant subnetworks in KEGG pathways based on topological regulatory information by generating local module memberships and corresponding p-values at different module sizes.

Usage

KEGG_module(
  mcmc.merge.list,
  dataset.names,
  KEGGspecies = "hsa",
  KEGGpathwayID,
  KEGG.dataGisTopologyG = FALSE,
  KEGG.dataG2topologyG = NULL,
  data.pair,
  gene_type = c("discordant", "concordant"),
  DE_PM_cut = 0.2,
  minM = 4,
  maxM = NULL,
  B = 1000,
  cores = 1,
  search_method = c("Exhaustive", "SA"),
  reps_eachM = 1,
  topG_from_previous = 1,
  Tm0 = 10,
  mu = 0.95,
  epsilon = 1e-05,
  N = 3000,
  Elbow_plot = T,
  filePath = getwd(),
  seed = 12345,
  sep = "-"
)

Arguments

`mcmc.merge.list:`	a list of merged MCMC output matrices.
`dataset.names:`	a vector of dataset names matched with the mcmc.merge.list.
`KEGGspecies:`	the KEGG species abbreviation. Default is "hsa".
`KEGGpathwayID:`	a KEGG pathway ID, not including the organism prefix.
`KEGG.dataGisTopologyG:`	whether gene names in data are same as entries on KEGG topology. If TRUE, search topology nodes/entries by data gene names directly. Default is FALSE.
`KEGG.dataG2topologyG:`	a data frame which maps gene names in mcmc.merge.list (first column) to entries on KEGG topology (second column). If NULL & KEGG.dataGisEntrezID=F & KEGGspecies is "hsa", "mmu" or "rno", gene symbols will be automatically mapped to EntrezID by Bioconductor packages "org.Hs.eg.db", "org.Mm.eg.db", "org.Rn.eg.db". If NULL & KEGG.dataGisEntrezID=F & KEGGspecies="cel", gene symbols will be automatically mapped to WormBase sequence name by Bioconductor package "biomaRt" with prefix "CELE_" added to match entry names in KEGG topology for Caenorhabditis elegans. If NULL & KEGG.dataGisEntrezID=F & KEGGspecies="dme", gene symbols will be automatically mapped to EntrezID and then FlyBase CG IDs by Bioconductor package "org.Dm.eg.db" with prefix "Dmel_" added to match entry names in KEGG topology for Drosophila melanogaster.
`data.pair:`	a character vector of two study names.
`gene_type:`	the type of module of interests. This should be one of "concordant" or "discordant".
`DE_PM_cut:`	only concordant/discordant genes with posterior mean of DE indicators above this value will be considered when searching for modules.
`minM:`	the miminum module size to consider during searching.
`maxM:`	the maximum module size to consider during searching. If NULL, maximum module size will be the number of all concordant/discordant genes.
`B:`	the number of permutations.
`cores:`	the number of cores to use in permutation (mc.cores parameter in 'mclapply' function).
`search_method:`	the method used to search modules with small average shortest path. This should be one of "Exhaustive" or "SA" (Simulated-Annealing).
`reps_eachM:`	the number of searching repetitions at each module size when SA is selected.
`topG_from_previous:`	the number of top module results stored as initials for next module size when SA is selected.
`Tm0:`	SA parameter - the initial temparature.
`mu:`	SA parameter - the temparature multiplier.
`epsilon:`	SA parameter - the final temparature.
`N:`	SA parameter - the number of maximum annealing times.
`Elbow_plot:`	a logical value indicating if an elbow plot of -log10(p-value) at each module size will be saved.
`filePath:`	the path to save the elbow plot. Default is the current working directory.
`seed:`	permutation seed.

Value

A list containing 5 elements.

minG.ls: contains the following information for each module size from minM to maxM. minG has genes in the module whose average shortest path is optimized. p.mean, p.sd and sp are p-values, corresponding standard deviation (sd) and the average shortest path respectively. null.sp.mean and null.sp.median are from permutated null distribution. If SAis selected, the top topG_from_previous results at each module size is stored in top.G.
bestSize: minG.ls results for the largest module size within 2 sd of the smallest p-value.
mergePMmat: a merged posterior DE mean matrix based on topology nodes (one node can have multiple genes).
KEGGspecies: the KEGG species abbreviation.
KEGGpathwayID: the KEGG pathway ID
data.pair: the two study names.
module.type: discordant or concordant modules.

In addition, the elbow plot of -log10(p-value) for each module size will be saved to the filePath.

Examples

## Not run: 
#mcmc.merge.list from the merge step (see the example in function 'merge')
dataset.names = c("hb","hs","ht","ha","hi","hl",
                  "mb","ms","mt","ma","mi","ml")
res_hsa04670 = KEGG_module(mcmc.merge.list, dataset.names,KEGGspecies="hsa",
                           KEGGpathwayID="04670",data.pair = c("hs","ms"),
                           gene_type = c("discordant"),
                           DE_PM_cut = 0.2, minM = 4,maxM = NULL,
                           B = 1000, cores = 1,
                           search_method = c("Exhaustive"),
                           Elbow_plot = T, filePath = getwd())

## End(Not run)

CAMO-R/Rpackage documentation built on July 20, 2023, 6:04 a.m.