opossom.new: Initialize the oposSOM pipeline.

View source: R/opossom.r

opossom.newR Documentation

Initialize the oposSOM pipeline.

Description

This function initializes the oposSOM environment and sets the preferences.

Usage

  opossom.new(preferences)

Arguments

preferences

list with the following optional values:

  • indata: input data matrix containing the expression values or an Biobase::ExpressionSet object (see 'Details' and 'Examples')

  • group.labels: sample assignment to a distinct group, subtype or class (character; "auto" or one label for each sample; may be given with indata ExpressionSet)

  • group.colors: colors of the samples for diverse visualizations (character; one color for each sample; may be given with indata ExpressionSet)

  • dataset.name: name of the dataset; used to name results folder and environment image (character).

  • note a short note shown in html summary file to give some keywords about the data or analysis parameters (character).

  • dim.1stLvlSom: dimension of primary SOM; use "auto" to apply automatic size estimation (integer, >5)

  • dim.2ndLvlSom: dimensions of second level SOM (integer, >5)

  • max.cores: maximum number of cores utilized by parallelized computing (integer, >0)

  • training.extension: factor to extend the number of iterations in SOM training (numerical, >0)

  • rotate.SOM.portraits: number of roations of the primary SOM in counter-clockwise fashion (integer {0,1,2,3})

  • flip.SOM.portraits: mirroring the primary SOM along the bottom-left to top-right diagonal (boolean)

  • database.dataset: type of ensemble dataset addressed with biomaRt interface; use "auto" to detect parameter automatically (character)

  • database.id.type: type of rowname identifier in biomaRt database; obsolete if database.dataset="auto" (character)

  • activated.modules (list): activates/deactivates pipeline functionalities:

    • largedata.mode (boolean): enables or disables calculation and output of single sample analyses (default: NULL). When activated, time-consuming functionalities are skipped. When NULL, large data mode is automatically activated when number of samples exceeds 1000.

    • reporting (boolean): enables or disables output of pdf and csv results and html summaries (default: TRUE). When deactivated, only R workspace will be stored after analysis.

    • primary.analysis (boolean): enables or disables data preprocessing and SOM training (default: TRUE). When deactivated, prior SOM training results are required to be contained in the workspace environment.

    • sample.similarity.analysis (boolean): enables or disables diversity analyses such as clustering heatmaps, correlation networks and ICA (default: TRUE).

    • geneset.analysis (boolean): enables or disables geneset analysis (default: TRUE).

    • psf.analysis (boolean): enables or disables pathway signal flow (PSF) analysis (default: TRUE). Human gene expression data is required as input data.

    • group.analysis (boolean): enables or disables group centered analyses such as group portraits and functional mining (default: TRUE).

    • difference.analysis (boolean): enables or disables pairwise comparisons of the grous and of pairs provided by pairwise.comparison.list as described below (default: TRUE).

  • standard.spot.modules: spot modules utilized in diverse downstream analyses (character, one of {"overexpression", "group.overexpression", "underexpression", "kmeans", "correlation", "dmap"})

  • spot.coresize.modules: spot detection in summary maps, minimum size (numerical, >0)

  • spot.threshold.modules: spot detection in summary maps, expression threshold (numerical, between 0 and 1)

  • spot.coresize.groupmap: spot detection in group-specific summary maps , minimum size (numerical, >0)

  • spot.threshold.groupmap: spot detection in group-specific summary maps, expression threshold (numerical, between 0 and 1)

  • feature.centralization: enables centralization of the features (boolean)

  • sample.quantile.normalization: enables quantile normalization of the samples (boolean)

  • pairwise.comparison.list: group list for pairwise analyses (list of group lists, see 'Examples') or NULL otherwise

Details

The package accepts the indata parameter in two formats:<br> Firstly a simple two-dimensional numerical matrix, where the columns and rows represent the samples and genes, respectively. The expression values are usually obtained by calibration and summarization algorithms (e.g. MAS5, VSN or RMA), and transformed into logarithmic scale prior to utilizing them in the pipeline. Secondly the input data can also be given as Biobase::ExpressionSet object. Please check the vignette for more details on the parameters.

Value

A new oposSOM environment which is passed to opossom.run.

Examples

env <- opossom.new(list(dataset.name="Example",
						note="a test with 10 random samples",	
						max.cores=2,
                        dim.1stLvlSom="auto",
                        dim.2ndLvlSom=10,
                        training.extension=1,
                        rotate.SOM.portraits=0,
                        flip.SOM.portraits=FALSE,
                        database.dataset="auto",
                        activated.modules = list( 
						  "largedata.mode" = NULL,
						  "reporting" = TRUE,
                          "primary.analysis" = TRUE, 
                          "sample.similarity.analysis" = TRUE,
                          "geneset.analysis" = TRUE, 
                          "psf.analysis" = TRUE,
                          "group.analysis" = TRUE,
                          "difference.analysis" = TRUE ),										
                        standard.spot.modules="dmap",
                        spot.coresize.modules=4,
                        spot.threshold.modules=0.9,
                        spot.coresize.groupmap=4,
                        spot.threshold.groupmap=0.7,
                        feature.centralization=TRUE,
                        sample.quantile.normalization=TRUE,
                        pairwise.comparison.list=list(
                          list("groupA"=c("sample1", "sample2"),
                               "groupB"=c("sample3", "sample4")))))


# definition of indata, group.labels and group.colors
env$indata = matrix( runif(1000), 100, 10 )
env$group.labels = c( rep("class 1", 5), rep("class 2", 4), "class 3" )
env$group.colors = c( rep("red", 5), rep("blue", 4), "green" )

# alternative definition of indata, group.labels and group.colors using Biobase::ExpressionSet
library(Biobase)

env$indata = ExpressionSet( assayData=matrix(runif(1000), 100, 10),
                            phenoData=AnnotatedDataFrame(data.frame( 
                                group.labels = c( rep("class 1", 5), rep("class 2", 4), "class 3" ),
                                group.colors = c( rep("red", 5), rep("blue", 4), "green" ) ))
                          )


hloefflerwirth/oposSOM documentation built on Oct. 29, 2024, 4:12 a.m.