evaluateDE: Compute the confusion matrix-related quantities from...

View source: R/Evaluate.R

evaluateDER Documentation

Compute the confusion matrix-related quantities from simulation results

Description

This function takes the simulation output from simulateDE and computes quantities of the confusion matrix for statistical power evaluation.

Usage

evaluateDE(simRes,
alpha.type=c("adjusted","raw"),
MTC=c('BY', 'BH', 'holm', 'hochberg', 'hommel', 'bonferroni', 'Storey', 'IHW'),
alpha.nominal=0.1,
stratify.by=c("mean", "dispersion", "dropout", "lfc"),
strata.probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9),
filter.by=c("none", "mean", "dispersion", "dropout"),
strata.filtered=1, target.by=c("lfc", "effectsize"), delta=0,
Table = TRUE)

Arguments

simRes

The result from simulateDE.

alpha.type

A string to represent the way to call DE genes. Available options are "adjusted" i.e. applying multiple testing correction and "raw" i.e. using p-values. Default is "adjusted".

MTC

Multiple testing correction method to use. Available options are 1) see p.adjust.methods, 2) Storey's qvalue see qvalue and 3) Independent Hypothesis Weighting considering mean expression as covariate (see ihw). Default is BY, i.e. Benjamini-Yekutieli FDR correction method.

alpha.nominal

The nomial level of significance. Default is 0.1.

stratify.by

A string to represent the way to stratify genes. Available options are "mean", "dispersion", "dropout" and "lfc", for stratifying genes by average expression levels, dispersion, dropout rates or estimated log2 fold changes.

strata.probs

A vector specifying the probability values for sample quantiles of the strata. See qvalue.

filter.by

A string to represent the way to filter genes. This is used in conjunction with strata.filtered for gene filtering. Available options are "none", "mean", "dispersion" and "dropout". "none" stands for no filtering, thus all genes will be considered. "mean" stands for filtering based on average gene expression levels. "dispersion" stands for filtering based on gene expression dispersion. "dropout" stands for filtering based on dropout rates.

strata.filtered

The strata to be filtered out in computing error matrix-related quantities. Genes falling into these strata will be excluded. See "Details" for more description of gene filtering.

target.by

A string to specify the method to define "biologically important" DE genes. Available options are (1) "lfc": interesting genes are defined by absolute log2 fold changes. (2) "effectsize": interesting genes are defined by absolute log2 fold changes divided by the square root of 1/(mean+dispersion).

delta

A threshold used for defining "biologically important" genes. Genes with absolute log2 fold changes (when target.by is "lfc") or effect sizes (when target.by is "effectsize") greater than this value are deemed DE in error rates calculations. If delta=0 then no threshold is applied. See "Details" for more description.

Table

A logical vector. If the default TRUE, then a table of marginal error rates is printed additionally (see printEvalDE).

Details

This is the main function to compute various power-related quantities, using stratification and filtering.

Gene stratification

We recommend to compute and visualize error rates (especially TPR) conditional on expression characteristics like mean, dispersion and/or dropout rate. It is likely that the power to detect DE genes is strongly dependent on mean expression levels even though the magnitude of effect sizes is the same. The stratified results will provide a more comprehensive power assessment and better guide the investigators in experimental designs and analysis strategies.

Gene filtering

Sometimes it is advisible to filter out some genes (such as the ones with very low mean expression) before DE detection. The filtering option here provides an opportunity to compare the rates before and after filtering.

Define biologically interesting genes

We provide two options to define biologically interesting genes: by absolute values of log fold changes or effect sizes (absolute values of log fold changes divided by the square root of 1/(mean+dispersions)). Genes with these quantities over a threshold are deemed interesting, and the rate calculations are based on these genes.

Value

A list with the following entries:

TN, TP, FP, FN, TNR, TPR, FPR, FNR, FDR

3D array representing the number of true negatives, true positives, false positives, false negatives and their proportions/rates as well as false discovery rate for all simulation settings. The dimension of the arrays are nstrata * N * nsims. Here nstrata is number of specified strata. N is number of different sample sizes settings, and nsims is number of simulations.

TN.marginal, TP.marginal, FP.marginal, FN.marginal

Matrix representing the number of true negatives, true positives, false positives, false negatives for all simulation settings. The dimension of the matrices are N * nsims. Here N is number of different sample sizes settings, and nsims is number of simulations.

TNR.marginal, TPR.marginal, FPR.marginal, FNR.marginal, FDR.marginal

Matrix representing the marginal rates for all simulation settings. The dimension of the matrices are N * nsims.

stratagenes, stratadiffgenes

Number of genes per stratum and number of DE genes per stratum.

stratify.by

The input "stratify.by".

strata

The input strata.

n1,n2

Sample sizes per group. This is taken from the simulation options.

target.by

The input method to define "biologically important" DE genes, either by log fold change or effect size.

delta

The input delta for biologically important genes. If delta=0, all target.by will be considered.

Author(s)

Beate Vieth

See Also

estimateParam for negative binomial parameters, Setup for setting up simulation parameters and simulateDE for simulating differential expression and plotEvalDE for visualisation.

Examples

## Not run: 
# estimate gene parameters
data("Bulk_Read_Counts")
data("GeneLengths_hg19")
estparam_gene <- estimateParam(countData = Bulk_Read_Counts,
                               readData = NULL,
                               batchData = NULL,
                               spikeData = NULL, spikeInfo = NULL,
                               Lengths = GeneLengths_hg19, MeanFragLengths = NULL,
                               RNAseq = 'bulk', Protocol = 'Read',
                               Distribution = 'NB', Normalisation = "MR",
                               GeneFilter = 0.25, SampleFilter = 3,
                               sigma = 1.96, NCores = NULL, verbose = TRUE)
# define log fold change
p.lfc <- function(x) sample(c(-1,1), size=x,replace=T)*rgamma(x, shape = 2, rate = 2)
# set up simulations
setupres <- Setup(ngenes = 10000, nsims = 10,
                  p.DE = 0.1, pLFC = p.lfc,
                  n1 = c(3,6,12), n2 = c(3,6,12),
                  Thinning = c(1,0.9,0.8), LibSize = 'given',
                  estParamRes = estparam_gene,
                  estSpikeRes = NULL,
                  DropGenes = FALSE,
                  sim.seed = 4379, verbose = TRUE)
# run simulation
simres <- simulateDE(SetupRes = setupres,
                     Prefilter = NULL, Imputation = NULL,
                     Normalisation = 'MR', Label = 'none',
                     DEmethod = "limma-trend", DEFilter = FALSE,
                     NCores = NULL, verbose = TRUE)
# DE evaluation
evalderes <- evaluateDE(simRes = simres, alpha.type="adjusted",
                        MTC='BH', alpha.nominal=0.05,
                        stratify.by = "lfc", filter.by = "mean",
                        strata.filtered = 1,
                        target.by = "lfc", delta = 0.25)
plotEvalDE(evalRes = evalderes, rate = "marginal")
plotEvalDE(evalRes = evalderes, rate = "conditional")

## End(Not run)

bvieth/powsimR documentation built on Aug. 19, 2023, 7:48 p.m.