computeUnivariateDigitization: Perform ternary digitization

Description Usage Arguments Value Examples

View source: R/main.R

Description

Function for obtaining the digitized form, along with other relevant statistics and measures given a data matrix and a baseline matrix

Usage

1
2
3
4
computeUnivariateDigitization(Mat, baseMat, computeQuantiles = TRUE,
  gamma = c(1:9/100, 1:9/10), beta = 0.95, alpha = 0.01,
  parallel = TRUE, verbose = TRUE, findGamma = TRUE, Groups = NULL,
  classes = NULL)

Arguments

Mat

Matrix of data to be digitized, in [0, 1], with each column corresponding to a sample and each row corresponding to a feature; usually in quantile form.

baseMat

Matrix of baseline data in [0, 1], (usually in quantiles), with each column corresponding to a sample and each row corresponding to a feature

computeQuantiles

Apply quantile transformation to both data and baseline matrices (TRUE or FALSE; defaults to TRUE).

gamma

Range of gamma values to search through. By default gamma = 0.01, 0.02, ... 0.09, 0.1, 0.2, ..., 0.9.

beta

Parameter for eliminating outliers (0 < beta <= 1). By default beta=0.95.

alpha

Expected proportion of divergent features per sample to be estimated. The optimal gamma providing this level of divergence in the baseline data will be searched for.

parallel

Logical indicating whether to compute features parallelly with mclapply on Unix based systems (defaults to TRUE, switched to FALSE if parallel package is not available).

verbose

Logical indicating whether to print status related messages during computation (defaults to TRUE).

findGamma

Logical indicating whether to search for optimal gamma values through the given gamma values (defaults to TRUE). If FALSE, the first value given in gamma will be used.

Groups

Factor indicating class association of samples (optional).

classes

Vector of class labels (optional).

Value

A list with elements: Mat.div: divergence coding of data matrix in ternary (-1, 0, 1) form, of same dimensions at Mat baseMat.div: divergence coding of base matrix in ternary (-1, 0, 1) form, of same dimensions at Mat div: data frame with the number of divergent features in each sample, including upper and lower divergence features.div: data frame with the divergent probability of each feature; divergence probability for each phenotype in included as well if 'Groups' and 'classes' inputs were provided. Baseline: a list containing a "Ranges" data frame with the baseline interval for each feature, and a "Support" binary matrix of the same dimensions as Mat indicating whether each sample was a support or a feature or not (1=support, 0=not in the support), gamma: selected gamma value, alpha: the expected number of divergent features per sample computed over the baseline data matrix, optimal: logical indicaing whether the selected gamma value provided the necessary alpha requirement, alpha_space: a data frame with alpha values for each gamma searched

Examples

1
2
3
4
5
6
7
baseMat = breastTCGA_Mat[, breastTCGA_Group == "NORMAL"]
dataMat = breastTCGA_Mat[, breastTCGA_Group != "NORMAL"]
div = computeUnivariateDigitization(
  Mat = dataMat,
  baseMat = baseMat,
 parallel = TRUE
)

wikum/divergence.preSE documentation built on Nov. 19, 2021, 3:37 a.m.