wrapKnown: Run all necessary steps to get expression estimates from...
In casper: Characterization of Alternative Splicing based on Paired-End Reads

Description Usage Arguments Details Value Author(s) References See Also Examples

Function to analyze bam files to generate an ExpressionSet with expression estimates for all samples, read start and fragment length distributions, path counts and optinally processed reads.

wrapKnown(bamFile, verbose=FALSE, seed=1, mc.cores.int=1,
mc.cores=1, genomeDB, readLength, rpkm=TRUE, priorq=2, priorqGeneExpr=2,
citype='none', niter=10^3, burnin=100, keep.pbam=FALSE,
keep.multihits=TRUE, chroms=NULL)

`bamFile`	Names of bam files with the sample to analyze. These must sorted and indexed, and the index must be in the same directory.
`verbose`	Set to `TRUE` to display progress information.
`seed`	Set seed of random number generator.
`mc.cores.int`	Number of cores to use when loading bam files. This is a memory intensive step, therefore number of cores must be chosen according to available RAM memory.
`mc.cores`	Number of cores to use in expression estimation.
`genomeDB`	`annotatedGenome` object containing annotated genome, as returned by the `procGenome` function.
`readLength`	Read length in bp, e.g. in a paired-end experiment where 75bp are sequenced on each end one would set `readLength=75`.
`rpkm`	Set to `TRUE` to return reads per kilobase per million (RPKM). Set to `FALSE` to return relative expression levels. Important, relative expression adds up to 1 within gene island, NOT within gene. To get relative expressions within gene run `relexprByGene` afterwards. See details.
`priorq`	Parameter of the prior distribution on the proportion of reads coming from each variant. The prior is Dirichlet with prior sample size for each variant equal to priorq. We recommend `priorq=2` for estimation, as it pools the estimated expression away from 0 and 1 and returned lower estimation errors than `priorq=1` in our simulated experiments.
`priorqGeneExpr`	Parameter for prior distribution on overall gene expression. Defaults to 2, which ensures non-zero estimates for all genes
`citype`	Set to `"none"` to return no credibility intervals. Set to `"asymp"` to return approximate 95% CIs (obtained via the delta method). Set to `"exact"` to obtain exact CIs via Monte Carlo simulation. Options `"asymp"` and especially `"exact"` can increase the computation time substantially.
`niter`	Number of Monte Carlo iterations. Only used when `citype=="exact"`.
`burnin`	Number of burnin Monte Carlo iterations. Only used when `citype=="exact"`.
`keep.pbam`	Set to `TRUE` to save processed bam object, as returned by `procBam`. This object can require substantial memory during execution and disk storage upon saving and is not needed for a default analysis.
`keep.multihits`	Set to `FALSE` to discard reads aligned to multiple positions.
`chroms`	Manually set chromosomes to be processed. By default only main chromosomes are considered (except 'chrM')

The function executes the functions procBam, getDistrs and pathCounts in parallel for each chromosome, but is much more efficient in cpu speed and memory usage than running these functions separately. Data from multiple samples are then combined using mergeExp. Note that further normalization (e.g. quantileNorm) may be needed preliminary to actual data analysis.

When rpkm is false the function returns the estimated proportion of reads arising from each isoform within a gene island. casper groups two or more genes into a gene island whenever these genes share an exon (or part of an exon). Because exons are shared, isoform quantification must be done simultaneously for all those genes.

That is, the output from wrapKnown when rpkm is FALSE are proportions that add up to 1 within each island. If you would like to re-normalize these expressions so that they add up to 1 within each gene, see the help for function relexprByGene.

One last remark: casper returns the estimated proportion of reads generated by each isoform, which is not the same as relative isoform expressions. Longer isoforms tend to produce more reads than shorter isoforms. This is easily accounted for by dividing relative expressions by isoform length, see relexprByGene.

`distr`	Object of class `readDistrs`
`pbam`	List of objects of class `procBam` with one element per chromosome
`pc`	Object of class `pathCounts`
`exp`	Object of class `ExpressionSet`

Camille Stephan-Otto Attolini, David Rossell

Rossell D, Stephan-Otto Attolini C, Kroiss M, Stocker A. Quantifying Alternative Splicing from Paired-End RNA-sequencing data. Annals of Applied Statistics, 8(1):309-330.

procGenome, relexprByGene, quantileNorm

## genDB<-makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene")
## hg19DB <- procGenome(genDB, "hg19")
##  bamFile="/path_to_bam/sorted.bam"
## ans <- wrapKnown(bamFile=bamFile, mc.cores.int=4, mc.cores=3, genomeDB=hg19DB, readLength=101)
##  names(ans)
##  head(exprs(ans\$exp))