calcExp: Estimate expression of a known set of gene splicing variants.
In casper: Characterization of Alternative Splicing based on Paired-End Reads

Description Usage Arguments Value Author(s) References See Also Examples

Estimate expression of gene splicing variants, assuming that the set of variants is known. When rpkm is set to TRUE, fragments per kilobase per million are returned. Otherwise relative expression estimates are returned.

1 2	calcExp(distrs, genomeDB, pc, readLength, islandid, rpkm=TRUE, priorq=2, priorqGeneExpr=2, citype="none", niter=10^3, burnin=100, mc.cores=1, verbose=FALSE)

`distrs`	List of fragment distributions as generated by the `getDistrs` function
`genomeDB`	`knownGenome` object containing annotated genome, as returned by the `procGenome` function.
`pc`	Named vector of exon path counts as returned by `pathCounts`
`readLength`	Read length in bp, e.g. in a paired-end experiment where 75bp are sequenced on each end one would set `readLength=75`.
`islandid`	Name of the gene island to be analyzed. If not specified, all gene islands are analyzed.
`rpkm`	Set to `FALSE` to return relative expression levels, i.e. the proportion of reads generated from each variant per gene. These proportions add up to 1 for each gene. Set to `TRUE` to return fragments per kilobase per million (RPKM).
`priorq`	Parameter of the prior distribution on the proportion of reads coming from each variant. The prior is Dirichlet with prior sample size for each variant equal to priorq. We recommend `priorq=2` for estimation, as it pools the estimated expression away from 0 and 1 and returned lower estimation errors than `priorq=1` in our simulated experiments.
`priorqGeneExpr`	Parameter for prior distribution on overall gene expression. Defaults to 2, which ensures non-zero estimates for all genes
`citype`	Set to `"none"` to return no credibility intervals. Set to `"asymp"` to return approximate 95% CIs (obtained via the delta method). Set to `"exact"` to obtain exact CIs via Monte Carlo simulation. Options `"asymp"` and especially `"exact"` can increase the computation time substantially.
`niter`	Number of Monte Carlo iterations. Only used when `citype=="exact"`.
`burnin`	Number of burnin Monte Carlo iterations. Only used when `citype=="exact"`.
`mc.cores`	Number of processors to be used for parallel computation. Can only be used if the package `multicore` is available for your system.
`verbose`	Set to `TRUE` to display progress information.

Expression set with expression estimates. featureNames identify each transcript via RefSeq ids, and the featureData contains further information. If citype was set to a value other than "none", the featureData also contains the 95% credibility intervals (i.e. intervals that contain the true parameter value with 95% posterior probability).

Camille Stephan-Otto Attolini, Manuel Kroiss, David Rossell

Rossell D, Stephan-Otto Attolini C, Kroiss M, Stocker A. Quantifying Alternative Splicing from Paired-End RNA-sequencing data. Annals of Applied Statistics, 8(1):309-330.

relexprByGene

data(K562.r1l1)
data(hg19DB)

#Pre-process
bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
pbam0 <- procBam(bam0)
head(getReads(pbam0))

#Estimate distributions, get path counts
distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
pc <- pathCounts(pbam0, DB=hg19DB)

#Get estimates
eset <- calcExp(distrs=distrs, genomeDB=hg19DB, pc=pc, readLength=75, rpkm=FALSE)
head(exprs(eset))
head(fData(eset))

#Re-normalize relative expression to add up to 1 within gene_id rather
# than island_id
eset <- relexprByGene(eset)

#Add fake sample by permuting and combine
eset2 <- eset[sample(1:nrow(eset),replace=FALSE),]
sampleNames(eset2) <- '2' #must have a different name
esetall <- mergeExp(eset,eset2)

#After merge samples are correctly matched
head(exprs(esetall))
head(fData(esetall))