sim_fixDepth: Simulating MPRA data
In redaq/atMPRA: Analysis Toolset for MPRA data

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/sim_fixDepth.R

This function simulates an MPRA dataset with specified input distribution and total depth across tags/barcodes.

1	sim_fixDepth(inputProp, ntag, nsim, nrepIn, nrepOut, slope, inputDispFunc=NULL, outputDispFunc=NULL, sampleDepth=NULL, inputDispParam=NULL, outputDispParam=NULL, meanDepth=NULL)

`inputProp`	A vector of numeric values indicating the input DNA count distribution, which is a pool of relative abundance of reads across tags. It should include 2ntagnsim proportions to indicate the proportion of reads correspond to each tag.
`ntag`	An integer indicating the number of tags/barcodes for each oligonucleotide (oligos) or each allele.
`nsim`	An integer indicating the number of simulations or number of SNPs included in the dataset.
`nrepIn`	An integer indicating the number of replicates for the DNA input.
`nrepOut`	An integer indicating the number of replicates for the RNA output.
`slope`	A numeric vector of length 2`nsim``ntag` specifying the transfection efficiencies across the oligos in the simulated data. The first half is for the reference allele, and the second half is for the mutant allele. To compute the power, the first half would be different from the values in the second half of the vector (under the alternative). This is referred as 'b' in the paper. See examples below.
`inputDispFunc`	Optional parameter that provides a dispersion function estimated for the input replicates using DESeq2.
`outputDispFunc`	Optional parameter that provides a dispersion function estimated for the output replicates using DESeq2.
`sampleDepth`	An integer vector specifying the total read depth over all tags. It could be of length 1 or length nrepIn+nrepOut. If it is of length 1, the same total depth is used for all DNA and RNA replicates. If this is specified, values for `meanDepth` is ignored.
`inputDispParam`	This parameter is required if inputDispFunc is not provided. It should give three parameter estimates for the dispersion function of the DNA input replicates. The three parameters correspond to a0, a1, and d2, which specify that the dispersion parameter is a lognormal distribution with mean log(a0+a1/mu) and sd d2, where mu is the mean of DNA count across the replicates.
`outputDispParam`	This parameter is required if outputDispFunc is not provided. It should give three parameter estimates for the dispersion function of the RNA output replicates. The three parameters correspond to a0, a1, and d2, which specify that the dispersion parameter is a lognormal distribution with mean log(a0+a1/mu) and sd d2, where mu is the mean of RNA count across the replicates.
`meanDepth`	An integer vector specifying the mean read depth over all tags. It could be of length 1 or length nrepIn+nrepOut. If it is of length 1, the same mean depth is used for all DNA and RNA replicates.

datt

A simulated data frame with ntag*nsim*2 number of rows and 2+nrep*2 number of columns. The first two columns are the allele and SNP name for each tag. The other columns are the generated DNA or RNA counts for the nrep replicates.

Dandi Qiao

Qiao, D., Zigler, C., Cho, M.H., Silverman, E.K., Zhou, X., et al. (2018). Statistical considerations for the analysis of massively parallel reporter assays data.

atMPRA atMPRA

data(GSE70531_params) 
inputDispFunc=getParam[[1]]
outputDispFunc=getParam[[2]]
totalDepth = 200000
ntag= 10
nsim= 10
nrepIn=5
nrepOut = 5
inputProp = getParam[[3]](runif(ntag*nsim*2))
slopel = getParam[[4]](runif(nsim*2))
slope = rep(slopel, each=ntag)

datt=sim_fixDepth(inputProp, ntag, nsim, nrepIn,  nrepOut, slope, inputDispFunc=inputDispFunc, outputDispFunc=outputDispFunc, sampleDepth=totalDepth)