estimateMPRA: Estimate the modeling parameters for MPRA data provided.
In redaq/atMPRA: Analysis Toolset for MPRA data

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/estimateMPRA.R

This function estimate the dispersion function for the input and output, the distribution of the true input proportions across the tags, and the distribution of the estimated transfection efficiency across the SNPs.

1	estimateMPRA(datt, nrepIn, rnaCol, nrepOut, nsim, ntag, plotFigure=FALSE, plotName="simData")

`datt`	A data frame containing the MPRA dataset. It should have nsimntag2 rows and 2+nrepIn+nrepOut columns. The first column should be named 'allele', and the second column should be named 'simN'. The 'allele' columns should contain only two possible values 'Ref' and 'Mut' to refer to the two versions of alleles for each SNP.
`nrepIn`	An integer indicating the number of DNA replicates.
`rnaCol`	An integer indicating the starting column of the RNA replicates in `datt`.
`nrepOut`	An integer indicating the number of RNA replicates.
`nsim`	An integer indicating the number of SNPs/comparisons in the MPRA data. A comparison refer to the unit with two alleles for testing allele-specific expression.
`ntag`	An integer indicating the number of tags/barcodes for each allele.
`plotFigure`	A logical value indicating whether the plot of the dispersion functions and the distributions of the input proportions and estimated transfection efficiencies should be made.
`plotName`	A character value specifying the name of the plot if it's being made.

This function normalizes the input and output replicates using DESeq2 and estimated the dispersion function as a function of the normalized mean count separated for the input and output. It uses the Turing algorithm to estimate the true input proportions across tags using the MPRA data provided. It estimate the transfection effciencies across the SNPs using the normalized RNA/DNA ratios.

`dispFunc_input`	The dispersion function across the input replicates estimated using DESeq2.
`dispFunc_output`	The dispersion function for the output replicates estimated using DESeq2.
`inputProp`	A function to generate proportions from the estimated true proportion of input counts across tags.
`transEff`	A function to generate transfection efficiencies from the estimated transfection efficiencies across the oligos.
`sizeFactor_input`	The normalization factor for the input replicates generated by DESeq2.
`sizeFactor_output`	The normalization factor for the output replicates generated by DESeq2.

Dandi Qiao

Qiao, D., Zigler, C., Cho, M.H., Silverman, E.K., Zhou, X., et al. (2018). Statistical considerations for the analysis of massively parallel reporter assays data.

atMPRA atMPRA atMPRA atMPRA

data(GSE70531_params) 
totalDepth = 200000
ntag= 10
nsim= 10
nrepIn=5
nrepOut = 5
inputProp = getParam[[3]](runif(ntag*nsim*2))
slope=c(rep(1, ntag*nsim), rep(1.5, ntag*nsim))
datt=sim_fixDepth(inputProp, ntag, nsim, nrepIn,  nrepOut, slope, sampleDepth=totalDepth) 

rnaCol=8
result=estimateMPRA(datt, nrepIn, rnaCol, nrepOut, nsim, ntag)