estimateMPRA: Estimate the modeling parameters for MPRA data provided.

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/estimateMPRA.R

Description

This function estimate the dispersion function for the input and output, the distribution of the true input proportions across the tags, and the distribution of the estimated transfection efficiency across the SNPs.

Usage

1
estimateMPRA(datt, nrepIn, rnaCol, nrepOut, nsim, ntag, plotFigure=FALSE, plotName="simData")

Arguments

datt

A data frame containing the MPRA dataset. It should have nsim*ntag*2 rows and 2+nrepIn+nrepOut columns. The first column should be named 'allele', and the second column should be named 'simN'. The 'allele' columns should contain only two possible values 'Ref' and 'Mut' to refer to the two versions of alleles for each SNP.

nrepIn

An integer indicating the number of DNA replicates.

rnaCol

An integer indicating the starting column of the RNA replicates in datt.

nrepOut

An integer indicating the number of RNA replicates.

nsim

An integer indicating the number of SNPs/comparisons in the MPRA data. A comparison refer to the unit with two alleles for testing allele-specific expression.

ntag

An integer indicating the number of tags/barcodes for each allele.

plotFigure

A logical value indicating whether the plot of the dispersion functions and the distributions of the input proportions and estimated transfection efficiencies should be made.

plotName

A character value specifying the name of the plot if it's being made.

Details

This function normalizes the input and output replicates using DESeq2 and estimated the dispersion function as a function of the normalized mean count separated for the input and output. It uses the Turing algorithm to estimate the true input proportions across tags using the MPRA data provided. It estimate the transfection effciencies across the SNPs using the normalized RNA/DNA ratios.

Value

dispFunc_input

The dispersion function across the input replicates estimated using DESeq2.

dispFunc_output

The dispersion function for the output replicates estimated using DESeq2.

inputProp

A function to generate proportions from the estimated true proportion of input counts across tags.

transEff

A function to generate transfection efficiencies from the estimated transfection efficiencies across the oligos.

sizeFactor_input

The normalization factor for the input replicates generated by DESeq2.

sizeFactor_output

The normalization factor for the output replicates generated by DESeq2.

Author(s)

Dandi Qiao

References

Qiao, D., Zigler, C., Cho, M.H., Silverman, E.K., Zhou, X., et al. (2018). Statistical considerations for the analysis of massively parallel reporter assays data.

See Also

atMPRA atMPRA atMPRA atMPRA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(GSE70531_params) 
totalDepth = 200000
ntag= 10
nsim= 10
nrepIn=5
nrepOut = 5
inputProp = getParam[[3]](runif(ntag*nsim*2))
slope=c(rep(1, ntag*nsim), rep(1.5, ntag*nsim))
datt=sim_fixDepth(inputProp, ntag, nsim, nrepIn,  nrepOut, slope, sampleDepth=totalDepth) 

rnaCol=8
result=estimateMPRA(datt, nrepIn, rnaCol, nrepOut, nsim, ntag)

redaq/atMPRA documentation built on July 24, 2020, 2:40 a.m.