evaluateDIMPclass | R Documentation |
For a given cutpoint (previously estimated with the function estimateCutPoint), 'evaluateDIMPclass' will return the evaluation of the classification of DMPs into two clases: DMPS from control and DMPs from treatment samples.
evaluateDIMPclass(
LR,
control.names,
treatment.names,
column = c(hdiv = FALSE, jdiv = FALSE, jdiv.stat = FALSE, TV = FALSE, bay.TV = FALSE,
wprob = FALSE, pos = FALSE),
classifier = c("logistic", "pca.logistic", "lda", "qda", "pca.lda", "pca.qda",
"random_forest"),
pval.col = NULL,
n.pc = 1,
center = FALSE,
scale = FALSE,
interactions = NULL,
output = "conf.mat",
prop = 0.6,
post.cut = 0.5,
maxnodes = NULL,
ntree = 400,
nsplit = 1L,
num.boot = 100,
num.cores = 1L,
tasks = 0L,
seed = 1234,
verbose = FALSE
)
LR |
An object from 'pDMP' class. including control and treatment GRanges containing divergence values for each DMP in the meta-column. LR is generated by the function 'selectDIMP' Each GRanges object must correspond to a sample. For example, if a sample is named 's1', then this sample can be accessed in the list of GRanges objects as LR$s1. |
control.names |
Names/IDs of the control samples, which must be include in the variable LR. |
treatment.names |
Names/IDs of the treatment samples, which must be included in the variable LR. |
column |
a logical vector for column names for the predictor variables to be used: Hellinger divergence 'hdiv', total variation 'TV', TV estimated with Bayesian correction of methylation levels 'bay.TV', probability of potential DMP 'wprob', and the relative cytosine site position 'pos' in respect to the chromosome where it is located. The relative position is estimated as (x - x.min)/(x.max - x), where x.min and x.max are the maximum and minimum for the corresponding chromosome, respectively. If 'wprob = TRUE', then Logarithm base-10 of 'wprob' will be used as predictor in place of 'wprob'. |
classifier |
Classification model to use. Option 'logistic' applies a logistic regression model; option 'lda' applies a Linear Discriminant Analysis (LDA); 'qda' applies a Quadratic Discriminant Analysis (QDA), 'pca.logistic' applies logistic regression model using the Principal Component (PCs) estimated with Principal Component Analysis (PCA) as predictor variables. pca.lda' applies LDA using PCs as predictor variables, and the option 'pca.qda' applies a Quadratic Discriminant Analysis (QDA) using PCs as predictor variables. |
pval.col |
Column number for p-value used in the performance analysis and estimation of the cutpoints. Default: NULL. If NULL it is assumed that the column is named 'wprob'. |
n.pc |
Number of principal components (PCs) to use if the classifier is not 'logistic'. In the current case, the maximun number of PCs is 4. |
center |
A logical value indicating whether the variables should be shifted to be zero centered. |
scale |
A logical value indicating whether the variables should be |
interactions |
Variable interactions to consider in a logistic regression model. Any pairwise combination of the variable 'hdiv', 'TV', 'wprob', and 'pos' can be provided. For example: 'hdiv:TV', 'wprob:pos', 'wprob:TV', etc. |
output |
Type of output to request: output = c('conf.mat', 'mc.val', 'boot.all', 'all'). See below. |
prop |
Proportion to split the dataset used in the logistic regression: group versus divergence (at DMPs) into two subsets, training and testing. |
post.cut |
Posterior probability to decide whether a DMPs belong to treatment group. Default post.cut = 0.5. |
maxnodes, ntree |
Only for Random Forest classifier
( |
nsplit |
Only for Random Forest classifier. The Random Forest
( |
num.boot |
Number of bootstrap validations to perform in the evaluation of the logistic regression: group versus divergence (at DMPs). |
num.cores, tasks |
Parameters for parallel computation using package
|
seed |
Random seed used for random number generation. |
verbose |
if TRUE, prints the function log to stdout |
The regulatory methylation signal is also an output from a natural process that continuously takes place across the ontogenetic development of the organisms. So, we expect to see methylation signal on natural ordinary conditions. Here, to distinguish a control methylation signal from a treatment, three classification models are provided: 1) logistic, 2) Linear Discriminant Analysis (LDA) and 3) Quadratic Discriminant Analysis (QDA). In particular, four predictor variables can be used: Hellinger divergence 'hdiv', total variation 'TV', probability of potential DMP 'wprob' and DMP genomic coordinated 'pos'. Principal component analysis (PCA) is used to convert a set of observations of possibly correlated predictor variables into a set of values of linearly uncorrelated variables (principal components, PCs). The PCs are used as new, uncorrelated predictor variables for LDA, QDA, and logistic classifiers.
A classification result with low accuracy and compromising values from other classification performance indicators (see below) suggest that the treatment does not induce a significant regulatory signal different from control.
Setting output = 'conf.mat' will perform a logistic regression group versus divergence (at DMPs) to evaluate the discrimination between control-DMPs and treatment-DMPs. The evaluation of this classification is provided through the function 'confusionMatrix' from R package 'caret'. 'mc.val' will perform a 'num.boot'-times Monte Carlo (bootstrap) validation and return a summary. By default function 'confusionMatrix' from R package caret' randomly splits the sample into two subsets, training and testing, according to the supplied proportion 'prop' (i.e., prop = 0.6). After selecting output = 'mc.val', the function 'confusionMatrix' will be executed 'num.boot'-times, each time performing a different random split of the sample. 'boot.all' same as 'mc.val' plus a matrix with statistcs reported by 'confusionMatrix'. 'all' return a list with all the mentioned outputs.
Robersy Sanchez (https://genomaths.com).
## Get a data set of DMPs
data(dmps, package = 'MethylIT')
## Classification of DMPs into two clases: DMPS from control and DMPs
## from treatment samples and evaluation of the classifier performance
## (for more details see ?evaluateDIMPclass).
perf <- evaluateDIMPclass(LR = dmps,
column = c(hdiv = TRUE, TV = TRUE,
wprob = TRUE, pos = TRUE),
classifier = 'lda', n.pc = 4L,
control.names = c('C1', 'C2', 'C3'),
treatment.names = c('T1', 'T2', 'T3'),
center = TRUE, scale = TRUE, prop = 0.6)
## Model classification performance
perf$Performance
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.