computeUnivariateDigitization: Perform ternary digitization
In wikum/divergence.preSE: Divergence Computations

Description Usage Arguments Value Examples

Function for obtaining the digitized form, along with other relevant statistics and measures given a data matrix and a baseline matrix

computeUnivariateDigitization(Mat, baseMat, computeQuantiles = TRUE,
  gamma = c(1:9/100, 1:9/10), beta = 0.95, alpha = 0.01,
  parallel = TRUE, verbose = TRUE, findGamma = TRUE, Groups = NULL,
  classes = NULL)

`Mat`	Matrix of data to be digitized, in [0, 1], with each column corresponding to a sample and each row corresponding to a feature; usually in quantile form.
`baseMat`	Matrix of baseline data in [0, 1], (usually in quantiles), with each column corresponding to a sample and each row corresponding to a feature
`computeQuantiles`	Apply quantile transformation to both data and baseline matrices (TRUE or FALSE; defaults to TRUE).
`gamma`	Range of gamma values to search through. By default gamma = 0.01, 0.02, ... 0.09, 0.1, 0.2, ..., 0.9.
`beta`	Parameter for eliminating outliers (0 < beta <= 1). By default beta=0.95.
`alpha`	Expected proportion of divergent features per sample to be estimated. The optimal gamma providing this level of divergence in the baseline data will be searched for.
`parallel`	Logical indicating whether to compute features parallelly with mclapply on Unix based systems (defaults to TRUE, switched to FALSE if parallel package is not available).
`verbose`	Logical indicating whether to print status related messages during computation (defaults to TRUE).
`findGamma`	Logical indicating whether to search for optimal gamma values through the given gamma values (defaults to TRUE). If FALSE, the first value given in gamma will be used.
`Groups`	Factor indicating class association of samples (optional).
`classes`	Vector of class labels (optional).

A list with elements: Mat.div: divergence coding of data matrix in ternary (-1, 0, 1) form, of same dimensions at Mat baseMat.div: divergence coding of base matrix in ternary (-1, 0, 1) form, of same dimensions at Mat div: data frame with the number of divergent features in each sample, including upper and lower divergence features.div: data frame with the divergent probability of each feature; divergence probability for each phenotype in included as well if 'Groups' and 'classes' inputs were provided. Baseline: a list containing a "Ranges" data frame with the baseline interval for each feature, and a "Support" binary matrix of the same dimensions as Mat indicating whether each sample was a support or a feature or not (1=support, 0=not in the support), gamma: selected gamma value, alpha: the expected number of divergent features per sample computed over the baseline data matrix, optimal: logical indicaing whether the selected gamma value provided the necessary alpha requirement, alpha_space: a data frame with alpha values for each gamma searched

baseMat = breastTCGA_Mat[, breastTCGA_Group == "NORMAL"]
dataMat = breastTCGA_Mat[, breastTCGA_Group != "NORMAL"]
div = computeUnivariateDigitization(
  Mat = dataMat,
  baseMat = baseMat,
 parallel = TRUE
)