ssea.prepare: Prepare an indexed database for MSEA

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/cle.LS.R

Description

ssea.prepare prepares a database that includes hierarchical for modules, i.e. it collects gene list and unique marker list of the modules for MSEA process

Usage

1

Arguments

job

a data list with the following components:

modules: module identities as characters.
genesgene: identities as characters.
loci: marker identities as characters.
moddata: preprocessed module data (indexed identities).
gendata: preprocessed mapping data (indexed identities).
locdata: preprocessed marker data (indexed identities).
mingenes: minimum module size allowed.
maxgenes: maximum module size allowed.
maxoverlap: maximum module overlap allowed (1.0 to skip).
quantiles: quantile points for test statistic.

Details

ssea.prepare removes extreme-sized modules, constructs a hierarchical representation of genes and modules, obtains hit counts for markers, and returns the finalized module, genes, markers, database information.

Value

job

an updated data list with the following components:

modules: finalized module names.
moddata: finalized module data.
gendata: finalized mapping data.
locdata: finalized marker data.
quantiles: verified quantile points.
database$modulesizes: gene counts for modules.
database$modulelengths: distinct markers counts for
modules.
database$moduledensities: ratio between distinct and
non-distinct markers.
database$genesizeslocus: count for each gene.
database$module2genes: gene lists for each module.
database$gene2locilocus: lists for each gene.
database$locus2row: indices in the marker data frame
for each marker.
database$observed: matrix of observed counts of values
that exceed each quantile point for each marker.
database$expected: 1.0 - quantile points.

Author(s)

Ville-Petteri Makinen

References

Shu L, Zhao Y, Kurt Z, Byars SG, Tukiainen T, Kettunen J, Orozco LD, Pellegrini M, Lusis AJ, Ripatti S, Zhang B, Inouye M, Makinen V-P, Yang X. Mergeomics: multidimensional data integration to identify pathogenic perturbations to biological systems. BMC genomics. 2016;17(1):874.

See Also

ssea.analyze, ssea.control, ssea.finish, ssea.start, ssea2kda

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
job.msea <- list()
job.msea$label <- "hdlc"
job.msea$folder <- "Results"
job.msea$genfile <- system.file("extdata", 
"genes.hdlc_040kb_ld70.human_eliminated.txt", package="Mergeomics")
job.msea$marfile <- system.file("extdata", 
"marker.hdlc_040kb_ld70.human_eliminated.txt", package="Mergeomics")
job.msea$modfile <- system.file("extdata", 
"modules.mousecoexpr.liver.human.txt", package="Mergeomics")
job.msea$inffile <- system.file("extdata", 
"coexpr.info.txt", package="Mergeomics")
job.msea$nperm <- 100 ## default value is 20000

## ssea.start() process takes long time while merging the genes sharing high
## amounts of markers (e.g. loci). it is performed with full module list in
## the vignettes. Here, we used a very subset of the module list (1st 10 mods
## from the original module file) and we collected the corresponding genes
## and markers belonging to these modules:
moddata <- tool.read(job.msea$modfile)
gendata <- tool.read(job.msea$genfile)
mardata <- tool.read(job.msea$marfile)
mod.names <- unique(moddata$MODULE)[1:min(length(unique(moddata$MODULE)),
10)]
moddata <- moddata[which(!is.na(match(moddata$MODULE, mod.names))),]
gendata <- gendata[which(!is.na(match(gendata$GENE, 
unique(moddata$GENE)))),]
mardata <- mardata[which(!is.na(match(mardata$MARKER, 
unique(gendata$MARKER)))),]

## save this to a temporary file and set its path as new job.msea$modfile:
tool.save(moddata, "subsetof.coexpr.modules.txt")
tool.save(gendata, "subsetof.genfile.txt")
tool.save(mardata, "subsetof.marfile.txt")
job.msea$modfile <- "subsetof.coexpr.modules.txt"
job.msea$genfile <- "subsetof.genfile.txt"
job.msea$marfile <- "subsetof.marfile.txt"
## run ssea.start() and prepare for this small set: (due to the huge runtime)
job.msea <- ssea.start(job.msea)
job.msea <- ssea.prepare(job.msea)

## Remove the temporary files used for the test:
file.remove("subsetof.coexpr.modules.txt")
file.remove("subsetof.genfile.txt")
file.remove("subsetof.marfile.txt")

Example output

Writing to file... 
Saved 1744 rows in 'subsetof.coexpr.modules.txt'.
[1] "subsetof.coexpr.modules.txt"

Writing to file... 
Saved 7376 rows in 'subsetof.genfile.txt'.
[1] "subsetof.genfile.txt"

Writing to file... 
Saved 7070 rows in 'subsetof.marfile.txt'.
[1] "subsetof.marfile.txt"

MSEA Version:01.04.2016

Parameters:
  Permutation type: gene
  Permutations: 100
  Random seed: 1
  Minimum gene count: 10
  Maximum gene count: 500
  Maximum overlap between genes: 0.33

Importing modules...
    MODULE             DESCR          
 Length:20          Length:20         
 Class :character   Class :character  
 Mode  :character   Mode  :character  
    MODULE              GENE          
 Length:1744        Length:1744       
 Class :character   Class :character  
 Mode  :character   Mode  :character  

Importing marker values...
    MARKER              VALUE         
 Length:7070        Min.   :  0.8094  
 Class :character   1st Qu.:  0.9531  
 Mode  :character   Median :  1.1685  
                    Mean   :  1.8574  
                    3rd Qu.:  1.5491  
                    Max.   :323.0100  

Importing mapping data...
     GENE              MARKER         
 Length:7376        Length:7376       
 Class :character   Class :character  
 Mode  :character   Mode  :character  

Merging genes containing shared markers...

16110 comparisons

7260 comparisons

7021 comparisons
Job: 0.7925262 Mb

Preparing data structures...
Job: 1.21257 Mb
[1] TRUE
[1] TRUE
[1] TRUE

Mergeomics documentation built on Nov. 8, 2020, 6:58 p.m.