ssea.analyze: Marker set enrichment analysis (MSEA)
In zeynebkurtUCLA/Mergeomics: Integrative network analysis of omics data

Description Usage Arguments Details Value Author(s) References See Also Examples

ssea.analyze finds the enrichment of the pathways or co-expression modules by a marker set (e.g. associated risk variants -loci- of a relevant disease). Association study by mapping markers (e.g. SNPs) to genes (e.g. via expression QTLs). Enrichment P-values obtained by the MSEA denote the degree of enrichment of significantly disease-associated (high ranking) markers (e.g. eSNPs) within these pathways when compared to the null distribution of expected uniform distribution of all ranks of the markers. MSEA is performed with either gene-level or marker-level permutations based on Gaussian distribution.

1	ssea.analyze(job, trim_start, trim_end)

`job`	the data list including fields: seed for random number generator `(job$seed)` (to obtain the same results from permutation when the same input set is given), random permutation level `(job$permtype)` (either gene- or marker-level), maximum number of permutations `(job$nperm)` for the permutation test, and the database that uses indexed identities for modules, genes, and markers (e.g. loci) `(job$database)`.
`trim_start`	percentile taken from the beginning for trimming away a defined proportion of genes with significant trait association to avoid signal inflation of null background in gene permutation. Default value is 0.002.
`trim_end`	percentile taken from the ending point for trimming away a defined proportion of genes with significant trait association to avoid signal inflation of null background in gene permutation. Default value is 0.998.

ssea.analyze associates the gene sets (pathways or co-expression modules) with relevant disease (e.g. Coronary Artery Disease) association data by mapping markers (e.g. SNPs) to genes (e.g. via expression QTLs). It performs the MSEA by using observed and estimated enrichment scores. First, the observed enrichment scores of the pathways by markers (e.g. loci) are calculated. Then, a Gaussian distribution based simulation is performed, by using the statistics of the observed scores (mean, std.dev., etc.), to obtain the estimated enrichment scores, enrichment frequencies, and other statistics e.g. p-values for the pathways. ssea.analyze trims away a defined proportion of genes with significant trait association to avoid signal inflation of null background in gene permutation by using trim_start and trim_end.

job

the updated data frame including results: indexed module identity, enrichment P-values, raw frequencies (raw frequency of a gene set defines the number of the estimated enrichment scores that are larger than this gene set's enrichment score under the null distribution based on Gaussian function)

Ville-Petteri Makinen

Shu L, Zhao Y, Kurt Z, Byars SG, Tukiainen T, Kettunen J, Orozco LD, Pellegrini M, Lusis AJ, Ripatti S, Zhang B, Inouye M, Makinen V-P, Yang X. Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems. BMC genomics. 2016;17(1):874.

ssea.control, ssea.finish, ssea.prepare, ssea.start, ssea2kda

job.msea <- list()
job.msea$label <- "hdlc"
job.msea$folder <- "Results"
job.msea$genfile <- system.file("extdata", 
"genes.hdlc_040kb_ld70.human_eliminated.txt", package="Mergeomics")
job.msea$marfile <- system.file("extdata", 
"marker.hdlc_040kb_ld70.human_eliminated.txt", package="Mergeomics")
job.msea$modfile <- system.file("extdata", 
"modules.mousecoexpr.liver.human.txt", package="Mergeomics")
job.msea$inffile <- system.file("extdata", 
"coexpr.info.txt", package="Mergeomics")
job.msea$nperm <- 100 ## default value is 20000

## ssea.start() process takes long time while merging the genes sharing high
## amounts of markers (e.g. loci). it is performed with full module list in
## the vignettes. Here, we used a very subset of the module list (1st 10 mods
## from the original module file) and we collected the corresponding genes
## and markers belonging to these modules:
moddata <- tool.read(job.msea$modfile)
gendata <- tool.read(job.msea$genfile)
mardata <- tool.read(job.msea$marfile)
mod.names <- unique(moddata$MODULE)[1:min(length(unique(moddata$MODULE)),
10)]
moddata <- moddata[which(!is.na(match(moddata$MODULE, mod.names))),]
gendata <- gendata[which(!is.na(match(gendata$GENE, 
unique(moddata$GENE)))),]
mardata <- mardata[which(!is.na(match(mardata$MARKER, 
unique(gendata$MARKER)))),]

## save this to a temporary file and set its path as new job.msea$modfile:
tool.save(moddata, "subsetof.coexpr.modules.txt")
tool.save(gendata, "subsetof.genfile.txt")
tool.save(mardata, "subsetof.marfile.txt")
job.msea$modfile <- "subsetof.coexpr.modules.txt"
job.msea$genfile <- "subsetof.genfile.txt"
job.msea$marfile <- "subsetof.marfile.txt"
## run ssea.start() and prepare for this small set: (due to the huge runtime)
job.msea <- ssea.start(job.msea)
job.msea <- ssea.prepare(job.msea)
job.msea <- ssea.control(job.msea)
job.msea <- ssea.analyze(job.msea)

## Remove the temporary files used for the test:
file.remove("subsetof.coexpr.modules.txt")
file.remove("subsetof.genfile.txt")
file.remove("subsetof.marfile.txt")