rRoma.R: Perform ROMA on a datasets

Description Usage Arguments

View source: R/Recode.R

Description

Perform ROMA on a datasets

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
rRoma.R(ExpressionMatrix, ModuleList, centerData = TRUE, ExpFilter = FALSE,
  UseWeigths = FALSE, DefaultWeight = 1, MinGenes = 10, MaxGenes = 1000,
  ApproxSamples = 5, nSamples = 100, OutGeneNumber = 5, Ncomp = 10,
  OutGeneSpace = NULL, FixedCenter = TRUE,
  GeneOutDetection = "L1OutExpOut", GeneOutThr = 5, GeneSelMode = "All",
  SampleFilter = TRUE, MoreInfo = FALSE, PlotData = FALSE, PCADims = 2,
  PCSignMode = "none", PCSignThr = NULL, UseParallel = FALSE,
  nCores = NULL, ClusType = "PSOCK", SamplingGeneWeights = NULL,
  FillNAMethod = list(), Grouping = NULL, FullSampleInfo = FALSE,
  GroupPCSign = FALSE, CorMethod = "pearson",
  PCAType = "DimensionsAreGenes", SuppressWarning = FALSE,
  ShowParallelPB = TRUE)

Arguments

ExpressionMatrix

matrix, a numeric matrix containing the gene expression information. Columns indicate samples and rows indicated genes.

ModuleList

list, gene module list

centerData

logical, should the gene expression values be centered over the samples?

ExpFilter

logical, should the samples be filtered?

UseWeigths

logical, should the weigths be used for PCA calculation?

DefaultWeight

integer scalar, the default weigth to us if no weith is specified by the modile file and an algorithm requiring weigths is used

MinGenes

integer, the minimum number of genes reported by a module available in the expression matrix to process the module

MaxGenes

integer, the maximum number of genes reported by a module available in the expression matrix to process the module

ApproxSamples

integer between 0 and 100 the approximation parameter to reuse samples. This is the minimal percentage variation to reuse samples. For example 5, means that samples re recalculated only if the number of genes in the geneset has increased by at least 5%.

nSamples

integer, the number of randomized gene sampled (per module)

OutGeneNumber

scalar, number of median-absolute-deviations away from median required for the total number of genes expressed in a sample to be called an outlier

Ncomp

iteger, number of principal components used to filter samples in the gene expression space

OutGeneSpace

scalar, number of median-absolute-deviations away from median required for in a sample to be called an outlier in the gene expression space. If set to NULL, the gene space filtering will not be performed.

FixedCenter

logical, should PCA with fixed center be used?

GeneOutDetection

character scalar, the algorithm used to filter genes in a module. Possible values are

  • 'L1OutVarPerc': percentage variation relative to the median variance explained supported by a leave one out approach

  • 'L1OutVarDC': dendrogram clustering statistics on variance explained supported by a leave one out approach

  • 'L1OutExpOut': number of median-absolute-deviations away from median explined variance

  • 'L1OutSdMean': Number of standard deviations away from the mean

The option "L1OutExpOut" requires the scater package to be installed.

GeneOutThr

scalar, threshold used by gene filtering algorithm in the modules. It can represent maximum size of filtered cluster ("L1OutVarDC"), minimal percentage variation (L1OutVarPerc) or the number of median-absolute-deviations away from median ("L1OutExpOut")

GeneSelMode

character scalar, mode used to sample genes: all available genes ("All") or genes not present in the module ("Others")

SampleFilter

logical, should outlier detection be applied to sampled data as well?

MoreInfo

logical, shuold detailed information on the computation by printed?

PlotData

logical, shuold debugging plots by produced ?

PCADims

integer, the number of PCA dimensions to compute. Should be >= 2. Note that, the value 1 is allowed, but is not advisable under normal circumstances. Larger values decrease the error in the estimation of the explained variance but increase the computation time.

PCSignMode

characrter scalar, the modality to use to determine the direction of the principal components. The following options are currentlhy available:

  • 'none' (The direction is chosen at random)

  • 'PreferActivation': the direction is chosen in such a way that the sum of the projection is positive

  • 'UseAllWeights': as 'PreferActivation', but the projections are multiplied by the weigths, missing weights are set to DefaultWeight

  • 'UseKnownWeights': as 'UseAllWeights', but missing weigth are set to 0

  • 'CorrelateAllWeightsByGene': the direction is chosen in such a way to maximise the positive correlation between the expression of genes with a positive (negative) weights and the (reversed) PC projections, missing weights are set to DefaultWeight

  • 'CorrelateKnownWeightsByGene': as 'CorrelateAllWeights', but missing weights are set to 0

  • 'CorrelateAllWeightsBySample': the direction is chosen in such a way to maximise the positive correlation between the expression of genes and the PC corrected weigth (i.e., PC weigths are multiplied by gene weigths), missing weights are set to DefaultWeight

  • 'CorrelateKnownWeightsBySample': as 'CorrelateAllWeightsBySample', but missing weights are set to 0

If 'CorrelateAllWeights', 'CorrelateKnownWeights', 'CorrelateAllWeightsBySample' or 'CorrelateKnownWeightsBySample' are used and GroupPCSign is TRUE, the correltions will be computed on the groups defined by Grouping.

PCSignThr

numeric scalar, a quantile threshold to limit the projections (or weights) to use, e.g., if equal to .9 only the 10% of genes with the largest projection (or weights) in absolute value will be considered.

UseParallel

boolean, shuold a parallel environment be used? Note that using a parallel environment will increase the memorey usage as a copy of the gene expression matrix is needed for each core

nCores

integer, the number of cores to use if UseParallel is TRUE. Set to NULL for auto-detection

ClusType

string, the cluster type to use. The default value ("PSOCK") should be available on most systems, unix-like environments also support the "FORK", which should be faster.

SamplingGeneWeights

named vector, numeric. Weigth so use when correcting the sign of the PC for sampled data.

FillNAMethod

names list, additional parameters to pass to the mice function

Grouping

named vector, the groups associated with the sample.

FullSampleInfo

boolean, should full PC information be computed and saved for all the randomised genesets?

GroupPCSign

boolean, should grouping information to be used to orient PCs?

CorMethod

character string indicating which correlation coefficient is to be used for orienting the principal components. Can be "pearson", "kendall", or "spearman".

PCAType

character string, the type of PCA to perform. It can be "DimensionsAreGenes" or "DimensionsAreSamples"

SuppressWarning

boolean, should warnings be displayed? This option well be ignored in non-interactive sessions.

ShowParallelPB

boolean, should the progress bas be displayed when using parallel processing. Note that the progress bar is diaplayed via the pbapply package. This may slow donwn the computation, expecially with FORK clusters.


Albluca/rRoma documentation built on May 5, 2019, 1:35 p.m.