ssea2kda: Generate inputs for wKDA
In zeynebkurtUCLA/Mergeomics: Integrative network analysis of omics data

Description Usage Arguments Details Value Author(s) References See Also Examples

ssea2kda forwards MSEA results to weighted key driver analysis (wKDA) from the first MSEA results, merges the overlapped modules according to a given overlapping ratio to obtain a relatively independent module set, apply a second MSEA on the merged modules (supersets), updates and saves the second MSEA results properly for wKDA process.

1	ssea2kda(job, symbols = NULL, rmax = NULL, min.module.count=NULL)

`job`	data list including the organized results of MSEA process. It has following components: results: updated information of modules including: number of distinct member genes (NGENES), number of distinct member markers (NLOCI), ratio of distinct to non-distinct markers (DENSITY), false discovery rates (FDR). generesults: updated gene-specific information including: indexed gene identity (GENE), gene size (NLOCI), unadjusted enrichment score (SCORE), marker with max value (LOCUS), marker value (VALUE).
`symbols`	dataframe for translating gene symbols
`rmax`	maximum allowed overlap ratio between gene sets
`min.module.count`	minimum number of the pathways to be taken from the MSEA results to merge. Default value is NULL. If it is not specified, all the pathways having MSEA-FDR value less than 0.25 will be considered for merging if they are overlapping with the given ratio `rmax`.

ssea2kda gets genes and top markers from input files, selects significant modules with respect to ordered p-values, gets identities of modules and genes, merges and trims the overlapping modules (either having FDR less than 0.25 or top min.module.count modules when ranked up to the P-values), obtains enrichment scores for merged modules, translates the gene symbols (between species) if needed, and finally saves the module, gene, node, and marker information into relevant output files.

plan

an updated data list with the following components:

label: unique identifier for the analysis.
parent: parent folder for results.
modfile: path of module file (columns: MODULE NODE).
inffile: path of module information file 
(columns: MODULE DESCR).
nodfile: path of node selection file (columns: NODE).

Ville-Petteri Makinen

Shu L, Zhao Y, Kurt Z, Byars SG, Tukiainen T, Kettunen J, Orozco LD, Pellegrini M, Lusis AJ, Ripatti S, Zhang B, Inouye M, Makinen V-P, Yang X. Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems. BMC genomics. 2016;17(1):874.

ssea.analyze, kda.analyze

job.msea <- list()
job.msea$label <- "hdlc"
job.msea$folder <- "Results"
job.msea$genfile <- system.file("extdata", 
"genes.hdlc_040kb_ld70.human_eliminated.txt", package="Mergeomics")
job.msea$marfile <- system.file("extdata", 
"marker.hdlc_040kb_ld70.human_eliminated.txt", package="Mergeomics")
job.msea$modfile <- system.file("extdata", 
"modules.mousecoexpr.liver.human.txt", package="Mergeomics")
job.msea$inffile <- system.file("extdata", 
"coexpr.info.txt", package="Mergeomics")
job.msea$nperm <- 100 ## default value is 20000

## ssea.start() process takes long time while merging the genes sharing high
## amounts of markers (e.g. loci). it is performed with full module list in
## the vignettes. Here, we used a very subset of the module list (1st 10 mods
## from the original module file) and we collected the corresponding genes
## and markers belonging to these modules:
moddata <- tool.read(job.msea$modfile)
gendata <- tool.read(job.msea$genfile)
mardata <- tool.read(job.msea$marfile)
mod.names <- unique(moddata$MODULE)[1:min(length(unique(moddata$MODULE)),
10)]
moddata <- moddata[which(!is.na(match(moddata$MODULE, mod.names))),]
gendata <- gendata[which(!is.na(match(gendata$GENE, 
unique(moddata$GENE)))),]
mardata <- mardata[which(!is.na(match(mardata$MARKER, 
unique(gendata$MARKER)))),]

## save this to a temporary file and set its path as new job.msea$modfile:
tool.save(moddata, "subsetof.coexpr.modules.txt")
tool.save(gendata, "subsetof.genfile.txt")
tool.save(mardata, "subsetof.marfile.txt")
job.msea$modfile <- "subsetof.coexpr.modules.txt"
job.msea$genfile <- "subsetof.genfile.txt"
job.msea$marfile <- "subsetof.marfile.txt"
## run ssea.start() and prepare for this small set: (due to the huge runtime)
job.msea <- ssea.start(job.msea)
job.msea <- ssea.prepare(job.msea)
job.msea <- ssea.control(job.msea)
job.msea <- ssea.analyze(job.msea)
job.msea <- ssea.finish(job.msea)

###############  Create intermediary datasets for KDA ##################
syms <- tool.read(system.file("extdata", "symbols.txt", 
package="Mergeomics"))
syms <- syms[,c("HUMAN", "MOUSE")]
names(syms) <- c("FROM", "TO")
job.ssea2kda <- ssea2kda(job.msea, symbols=syms)

## Remove the temporary files used for the test:
file.remove("subsetof.coexpr.modules.txt")
file.remove("subsetof.genfile.txt")
file.remove("subsetof.marfile.txt")