classPerform: Classification performance based on divergences of...

View source: R/classPerform.R

classPerformR Documentation

Classification performance based on divergences of methylation levels

Description

The classification performance based on an information divergence (e.g., Hellinger divergence) carried in a list of GRanges objects. The total variation distance (TVD, absolute difference of methylation levels) is used as pivot to specify the cytosine sites considered as true positives and true negatives. Function confusionMatrix from package 'caret' is applied to get the classification performance.

Usage

classPerform(
  LR,
  min.tv = 0.25,
  tv.cut,
  cutoff,
  tv.col,
  div.col = NULL,
  pval.col = NULL,
  stat = 1
)

Arguments

LR

A list of GRanges, a GRangesList, a CompressedGRangesList object. Each GRanges object from the list must have two columns: methylated (mC) and unmethylated (uC) counts. The name of each element from the list must coincide with a control or a treatment name.

min.tv

Minimum value for the total variation distance (TVD; absolute value of methylation levels differences, TVD = abs(TV)). Only sites/ranges k with TVD_{k} > min.tv are analyzed. Defaul min.tv = 0.25.

tv.cut

A cutoff for the total variation distance to be applied to each site/range. If tv.cut is provided, then sites/ranges k with TVD_{k} < tv.cut are considered TRUE negatives and TVD_{k} > tv.cut TRUE postives. Its value must be NULLor a number 0 < tv.cut < 1.

cutoff

A divergence of methylation levels or a p-value cutoff-value for the the magnitude given in div.col or in pval.col, respectively (see below). The values greater than 'cutoff' are predicted TRUE (positives), otherwise are predicted FALSE (negatives).

tv.col

Column number for the total variation distance (TVD; absolute value of methylation levels differences, TVD = abs(TV)).

div.col

Column number for divergence variable used in the performance analysis and estimation of the cutpoints. Default: NULL. One of the parameter values div.col or pval.col must be given.

pval.col

Column number for p-value used in the performance analysis and estimation of the cutpoints. Default: NULL. One of the parameter values div.col or pval.col must be given.

stat

An integer number indicating the statistic to be used in the testing. The mapping for statistic names are: 0 = 'All' 1 = 'Accuracy', 2 = 'Sensitivity', 3 = 'Specificity', 4 = 'Pos Pred Value', 5 = 'Neg Pred Value', 6 = 'Precision', 7 = 'Recall', 8 = 'F1', 9 = 'Prevalence', 10 = 'Detection Rate', 11 = 'Detection Prevalence', 12 = 'Balanced Accuracy'.

Details

Samples from each group are pooled according to the statistic selected (see parameter pooling.stat) and a unique GRanges object is created with the methylated and unmathylated read counts for each group (control and treatment) in the metacolumn. So, a contingence table can be built for range from GRanges object.

Value

A list with the classification repformance results

Author(s)

Robersy Sanchez

Examples

# load simulated data of potential methylated signal
data(sim_ps)

classPerform(LR = PS, min.tv = 0.25, tv.cut = 0.4,
             cutoff = 68.7, tv.col = 7L, div.col = 9, stat = 0)

genomaths/MethylIT.utils documentation built on July 4, 2023, 12:05 a.m.