computeRPKM: Compute RPKM based on gene annotations
In RIPSeeker: RIPSeeker: a statistical package for identifying protein-associated transcripts from RIP-seq experiments

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Given a list of single-end or paired-end read alignment files in BAM/SAM/BED format, compute the read counts and normalized read counts as expression of annotated transcript in the unit of "reads per kilobase of exon per million mapped reads" (RPKM).

computeRPKM(bamFiles, RIPSeekerRead = TRUE, paired = FALSE, 
	countMode = "IntersectionNotEmpty", featureGRanges, 
	idType = "ensembl_transcript_id", featureType = "exon", 
	ignore.strand = FALSE, txDbName = "biomart", 
	moreGeneInfo = FALSE, saveData, justRPKM = TRUE, ...)

`bamFiles`	A list of one or more BAM/SAM/BED alignment files.
`RIPSeekerRead`	Binary flag. If TRUE, then import and process the alignment files using the built-in function `combineAlignGals` from RIPSeeker package; if FALSE, then import the files by directly calling the required functions. The flag makes using the function outside of `RIPSeeker` package become possible.
`paired`	Binary to indicate whether the alignments files are paired-end. The alignments file must be either paired-end or single-end but not both.
`countMode`	An argument used to set the `mode` argument in the underlying function `summarizeOverlaps` employed to compute the read counts for each feature. The possible mode includes "Union", "IntersectionStrict", and "IntersectionNotEmpty". All three modes avoid double counting the reads by either discarding reads that completely fall into multiple features or counting the read only once for the feature that uniquely and completely includes it. Please refer to `summarizeOverlaps` for details.
`featureGRanges`	GRanges of features as an optional argument for function to compute RPKM/FPKM just for those features without retrieving online annotations.
`idType`	A character string that specifies the type of the annotations, which can "ensembl_transcript_id", "ensembl_gene_id", "ucsc", etc. Refer to `listFilters` for more information.
`featureType`	Features that will be groupped by genes/transcripts in a GRangesList. The available options are "exon" (Default), "intron", "fiveUTR", "threeUTR", and "CDS" corresponding to the functions `exonsBy`, `cdsBy`, `intronsByTranscript`, `fiveUTRsByTranscript`, `threeUTRsByTranscript`, and `cdsBy`, respectively.
`ignore.strand`	Whether to ignore strand when counting the reads (Default: FALSE).
`txDbName`	Name of the transcript database to use to retreive the annotation. The available options are "biomart" (Default) or "UCSC" corresponding to the functions `makeTxDbFromBiomart` and `makeTxDbFromUCSC`, respectively.
`moreGeneInfo`	Binary indicator to indicate whether to download more information for each genes/transcripts rather than having only the gene/transcript IDs (Default: FALSE).
`saveData`	Path of output file.
`justRPKM`	Binary for whether to return only the `RangedSummarizedExperiment`.
`...`	Extra arguments passed to functions `makeTxDbFromBiomart`, `makeTxDbFromUCSC`, `useMart`, `combineAlignGals`.

The function is a wrapper function making use of several external functions from several well maintained and freely available Bioconductor packages including GenomicFeatures, GenomicRanges, biomaRt and Rsamtools packages. The paired-end alignments are converted into single-end using function galp2gal and then subject to read count computation by summarizeOverlaps, which does not yet directly support paired-end alignments.

`rpkmSEobject`	A `RangedSummarizedExperiment` object with `assays` slot saved for counts, `rowRanges` holds the features, `metadata` for RPKM/FPKM (normalized) gene expression.
`rpkmDF`	Data frame with or without the detailed gene information columns depending on whether `moreGeneInfo` is TRUE or FALSE. `rpkmDF` is only returned within in a list when `justRPKM` is FALSE.
`featureGRanges`	The features in GRanges object that are used to compute the gene expression. `featureGRanges` is only returned within in a list when `justRPKM` is FALSE.

Also works for RNA-seq alignments.

Yue Li

M. Carlson, H. Pages, P. Aboyoun, S. Falcon, M. Morgan, D. Sarkar and M. Lawrence. GenomicFeatures: Tools for making and manipulating transcript centric annotations. R package version 1.8.2.

P. Aboyoun, H. Pages and M. Lawrence (). GenomicRanges: Representation and manipulation of genomic intervals. R package version 1.8.9.

Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Steffen Durinck, Paul T. Spellman, Ewan Birney and Wolfgang Huber, Nature Protocols 4, 1184-1191 (2009).

BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Steffen Durinck, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma and Wolfgang Huber, Bioinformatics 21, 3439-3440 (2005).

Martin Morgan and Herv\'e Pag\'es (). Rsamtools: Binary alignment (BAM), variant call (BCF), or tabix file import. R package version 1.8.5. http://bioconductor.org/packages/release/bioc/html/Rsamtools.html

makeTxDbFromBiomart, makeTxDbFromUCSC, useMart, exonsBy, cdsBy, intronsByTranscript, fiveUTRsByTranscript, threeUTRsByTranscript, cdsBy, combineAlignGals, summarizeOverlaps, ScanBamParam, readGAlignmentPairs, readGAlignments

if(interactive()) { # need internet connection
	
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# use biomart
txDbName <- "biomart"
biomart <- "ENSEMBL_MART_ENSEMBL"		# use archive to get ensembl 65
dataset <- "mmusculus_gene_ensembl"		
host <- "dec2011.archive.ensembl.org" 	# use ensembl 65 for annotation

resultlist <- computeRPKM(bamFiles=grep(pattern="SRR039214", 
        bamFiles, value=TRUE, invert=TRUE), #featureGRanges=featureGRanges, 
				dataset=dataset, moreGeneInfo=TRUE, justRPKM=FALSE,
				idType="ensembl_transcript_id", txDbName=txDbName, 
				biomart=biomart, host=host, by="tx")
		
}