sim_fixInputMean: Simulating MPRA data

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/sim_fixInputMean.R

Description

This function simulates an MPRA dataset with specified input distribution and mean depth across tags/barcodes.

Usage

1
sim_fixInputMean(mean_A, mean_B,  ntag, nsim, nrepIn, nrepOut, slope, inputDist=NULL, std_A=mean_A, std_B=mean_B, inputDispFunc=NULL, outputDispFunc=NULL, inputDispParam=NULL, outputDispParam=NULL)

Arguments

mean_A

The mean of the true input counts across tags for the reference allele.

mean_B

The mean of the true input counts across tags for the mutant allele.

ntag

An integer indicating the number of tags/barcodes for each oligonucleotide (oligos) or each allele.

nsim

An integer indicating the number of simulations or number of SNPs included in the dataset.

nrepIn

An integer indicating the number of DNA replicates.

nrepOut

An integer indicating the number of RNA output replicates.

inputDist

This parameter is required if std_A and std_B were not specified. This should be a vector of proportions that can be used to sample the input proportions across tags.

std_A

An optional parameter specifying the standard devaition of the mean input counts across tags for the reference allele.

std_B

An optional parameter specifying the standard devaition of the mean input counts across tags for the mutant allele.

inputDispFunc

Optional parameter that provides a dispersion function estimated for the input replicates using DESeq2.

outputDispFunc

Optional parameter that provides a dispersion function estimated for the output replicates using DESeq2.

inputDispParam

This parameter is required if inputDispFunc is not provided. It should give three parameter estimates for the dispersion function of the DNA input replicates. The three parameters correspond to a0, a1, and d2, which specify that the dispersion parameter is a lognormal distribution with mean log(a0+a1/mu) and sd d2, where mu is the mean of DNA count across the replicates.

outputDispParam

This parameter is required if outputDispFunc is not provided. It should give three parameter estimates for the dispersion function of the RNA output replicates. The three parameters correspond to a0, a1, and d2, which specify that the dispersion parameter is a lognormal distribution with mean log(a0+a1/mu) and sd d2, where mu is the mean of RNA count across the replicates.

Value

datt

A simulated data frame with ntag*nsim*2 number of rows and 2+nrep*2 number of columns. The first two columns are the allele and SNP name for each tag. The other columns are the generated DNA or RNA counts for the nrep replicates.

Author(s)

Dandi Qiao

References

Qiao, D., Zigler, C., Cho, M.H., Silverman, E.K., Zhou, X., et al. (2018). Statistical considerations for the analysis of massively parallel reporter assays data.

See Also

atMPRA atMPRA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(GSE70531_params) 
ntag= 10
nsim= 10
nrepIn=5
nrepOut = 5
slopel = getParam[[4]](runif(nsim*2))
slope = rep(slopel, each=ntag)
inputDist= getParam[[3]](runif(nsim*ntag*2))
inputDispFunc=getParam[[1]]
outputDispFunc=getParam[[2]]

datt=sim_fixInputMean(mean_A=10, mean_B=100,  ntag=ntag, nsim=nsim, nrepIn=nrepIn, nrepOut=nrepOut, slope=slope, inputDist=inputDist, inputDispFunc=inputDispFunc, outputDispFunc=outputDispFunc)

redaq/atMPRA documentation built on July 24, 2020, 2:40 a.m.