getPower: Power calculation for designing MPRA experiments

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/getPower.R

Description

This function computes the power (or type I error if under the null) of the specified tests using simulated MPRA data with user-specified parameters for the MPRA experiment.

Usage

1
getPower(nsim = 100, ntag = 10, nrepIn = 2, nrepOut=2, slope = 1, scenario=c("fixInputDist", "fixTotalDepth", "fixMeanDepth"), method=c("MW", "Matching", "Adaptive", "Fisher", "QuASAR", "T-test", "mpralm", "edgeR", "DESeq2"), fixInput  = c(20, 100), fixMeanD = 70 , fixTotalD= 20000000, std_A=mean_A, std_B=mean_B, inputDist=NULL, inputDispFunc=NULL, outputDispFunc=NULL, inputDispParam=NULL, outputDispParam=NULL, cutoff=-1, cutoffo=-1,  p.adjust.method="fdr", significance=0.05)

Arguments

nsim

An integer indicating the number of simulations or number of SNPs included in the dataset

ntag

An integer indicating the number of tags/barcodes for each oligonucleotide (oligos) or each allele

nrepIn

An integer indicating the number of DNA replicates

nrepOut

An integer indicating the number of RNA replicates

slope

A numeric vector of length 1 or 2*nsim*ntag specifying the transfection efficiencies across the oligos in the simulated data. The first half is for the reference allele, and the second half is for the mutant allele. To compute the power, the first half would be different from the values in the second half of the vector (under the alternative). This is referred as 'b' in the paper. See examples below.

scenario

Three different simulation scenarios are included. This includes \ fixInputDist - fix the mean and sd of the true input counts across the tags for the two alleles. The same input distribution is used for all simulations/SNPs; fixTotalDepth - Sample the true input counts from the default/given distribution, and scale to the specified total depth. fixMeanDepth - sample the true input counts from the default/given distrubtion, and scale it so the mean depth across all tags equals the specified mean depth.

method

Accepts a vector of characters specifying the tests to be used. The possible options are: MW, Matching, Adaptive, Fisher, QuASAR, T-test, mpralm, edgeR and DESeq2.

fixInput

A vector of two numeric values specifying the mean of input count for the reference and alternative alleles for all SNPs.

fixMeanD

If scenario is fixTotalDepth, a numeric number is expected, which is the total reads across all tags.

fixTotalD

If scenario is fixMeanDepth, a numeric value is expected, which is the mean reads across all tags.

std_A

An optional parameter specifying the standard devaition of the mean input counts across tags for the reference allele.

std_B

An optional parameter specifying the standard devaition of the mean input counts across tags for the mutant allele.

inputDist

A vector of numerical proportions that can be used to generate the DNA proportions across the tags.

inputDispFunc

Optional parameter that provides a dispersion function estimated for the input replicates using DESeq2.

outputDispFunc

Optional parameter that provides a dispersion function estimated for the output replicates using DESeq2.

inputDispParam

This parameter is required if inputDispFunc is not provided. It should give three parameter estimates for the dispersion function of the DNA input replicates. The three parameters correspond to a0, a1, and d2, which specify that the dispersion parameter is a lognormal distribution with mean log(a0+a1/mu) and sd d2, where mu is the mean of DNA count across the replicates.

outputDispParam

This parameter is required if outputDispFunc is not provided. It should give three parameter estimates for the dispersion function of the RNA output replicates. The three parameters correspond to a0, a1, and d2, which specify that the dispersion parameter is a lognormal distribution with mean log(a0+a1/mu) and sd d2, where mu is the mean of RNA count across the replicates.

cutoff

A numeric or integer value. Tags with DNA count less than or equal to cutoff in any of the DNA replicates will be removed.

cutoffo

A numeric or integer value. Tags with RNA count less than or equal to cutoffo in any of the RNA replicates will be removed.

p.adjust.method

A character string. The correction method for multiple comparisons, the options are: holm, hochberg, hommel, bonferroni, BH, BY, fdr, and none. See stats.

significance

The significance level used to estimate power.

Details

This function simulates MPRA data according to user-specified parameters, and computes the power/type I error of the tests requested using the simulated data. Before analyzing the simulated data, normalization and filtering is performed.

Value

Power

The power/type I error of the specified tests.

simData

The simulated data frame.

results

The actual p-values of all the SNPs for the specified tests in the simulated data.

Author(s)

Dandi Qiao

References

Qiao, D., Zigler, C., Cho, M.H., Silverman, E.K., Zhou, X., et al. (2018). Statistical considerations for the analysis of massively parallel reporter assays data.

See Also

atMPRA

Examples

1
2
3
4
5
6
7
8
9
data(GSE70531_params) 
ntag= 10
nsim= 10
nrepIn=2
nrepOut = 2
inputDist= GSE70531_params[[3]](runif(nsim*ntag*2))
inputDispFunc=GSE70531_params[[1]]
outputDispFunc=GSE70531_params[[2]]
result = getPower(nsim, ntag, nrepIn, nrepOut, slope = 1, scenario="fixInputDist", method=c("MW","T-test", "mpralm", "edgeR", "DESeq2"), fixInput  = c(20, 100), inputDist=inputDist, inputDispFunc=inputDispFunc, outputDispFunc=outputDispFunc,  cutoff=-1, cutoffo=-1)

redaq/atMPRA documentation built on July 24, 2020, 2:40 a.m.