rulebaseRIPSeek: Compute the RPKM and foldchange between two conditions for...
In gorillayue/RIPSeeker: RIPSeeker: a statistical package for identifying protein-associated transcripts from RIP-seq experiments

Description Usage Arguments Details Value Note Author(s) References See Also Examples

The function takes alignments in two conditions (with replicates) as input and computes the gene expression in the two conditions in the unit of RPKM (reads per kilobase of exon per million mapped reads) or FPKM for paired-end alignments (where "F" stands for the fragment the mate-pairs are derived from), and then the foldchange ratio of the RPKM of each gene in RIP or treatment condition in general over control condition. The control files (i.e. the denominator in the foldchange ratio) is specified by user in the "cNAME" argument.

1
2
3

rulebaseRIPSeek(bamFiles, cNAME, featureGRanges, rpkmCutoff = 0.4, 
	fcCutoff = 3, moreRIPGeneInfo = TRUE, idType = "ensembl_transcript_id", 
	myMin = .Machine$double.xmin, saveRData, ...)

`bamFiles`	A list of one or more BAM/SAM/BED alignment files.
`cNAME`	An identifer pattern found in the control alignment files. Once specified, these files will be used as control as the denomenator of the foldchange ratio and the remaining files as RIP, the numarator of the foldchange ratio.
`featureGRanges`	GRanges of features as an optional argument for function to compute RPKM/FPKM just for those features without retrieving online annotations.
`rpkmCutoff`	Cutoff for RPKM in RIP above which the genes will be reported if the `fcCutoff` is also satisfied (Default: 0.4).
`fcCutoff`	Cutoff for foldchange in RIP relative to the control above which the genes will be reported if the `rpkmCutoff` is also satisfied (Default: 3).
`moreRIPGeneInfo`	Binary indicator to indicate whether to download more information for each genes/transcripts rather than having only the gene/transcript IDs (Default: TRUE).
`idType`	A character string that specifies the type of the annotations, which can "ensembl_transcript_id" (Default), "ensembl_gene_id", "ucsc", etc. Refer to `listFilters` for more information.
`myMin`	Add a small value to both the numerator and denomenator as "pseudocount" to prevent the case where the denomenator is zero and the ratio becomes infinity regardless the value of the numerator (Default: `.Machine$double.xmin`).
`saveRData`	Path of output RData and tab-delim results.
`...`	Extra arguments passed to `computeRPKM` and/or`useMart`.

The function uses computeRPKM to download annotation and compute RPKM/FPKM of the annotated genes in the list of files. The alignments file are separated into control as identified by the "cNAME" and the RIP (or any treatment) that do not have the cNAME in their file names. The alignments in either group are pooled together. If moreRIPGeneInfo is specified, the function witll query the Ensembl database. The chromosome ID in the numerical format used in Ensembl is prefixed with "chr" and the strand 1 and -1 converted to + and - for convenience.

A list containing the following items:

`nRPKM`	RPKM of genes in RIP or treatment condition ('n' stands for numerator in the foldchange ratio).
`dRPKM`	RPKM of genes in control condition ('d' stands for denomenator in the foldchange ratio)
`rpkmDF`	Data frame containing read count, RPKM for the RIP (or treatment) and control, foldchange, and optional gene information including "chromosome_name", "start_position", "end_position", "strand", "external_gene_id", "ensembl_transcript_id", "ensembl_gene_id", "ucsc", "description"
`rpkmCutoff`	Cutoff used for RPKM as book keeping value.
`fcCutoff`	Cutoff used for foldchange as book keeping value.
`featureGRanges`	`GRanges` object of the features for which the RPKM and foldchange are computed.

Also works for RNA-seq alignments.

Yue Li

Zhao, J., Ohsumi, T. K., Kung, J. T., Ogawa, Y., Grau, D. J., Sarma, K., Song, J. J., et al. (2010). Genome-wide Identification of Polycomb-Associated RNAs by RIP-seq. Molecular Cell, 40(6), 939D953. doi:10.1016/j.molcel.2010.12.011

M. Carlson, H. Pages, P. Aboyoun, S. Falcon, M. Morgan, D. Sarkar and M. Lawrence. GenomicFeatures: Tools for making and manipulating transcript centric annotations. R package version 1.8.2.

P. Aboyoun, H. Pages and M. Lawrence (). GenomicRanges: Representation and manipulation of genomic intervals. R package version 1.8.9.

Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Steffen Durinck, Paul T. Spellman, Ewan Birney and Wolfgang Huber, Nature Protocols 4, 1184-1191 (2009).

BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Steffen Durinck, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma and Wolfgang Huber, Bioinformatics 21, 3439-3440 (2005).

Martin Morgan and Herv\'e Pag\'es (). Rsamtools: Binary alignment (BAM), variant call (BCF), or tabix file import. R package version 1.8.5. http://bioconductor.org/packages/release/bioc/html/Rsamtools.html

makeTxDbFromBiomart, makeTxDbFromUCSC, useMart, exonsBy, cdsBy, intronsByTranscript, fiveUTRsByTranscript, threeUTRsByTranscript, cdsBy, combineAlignGals, summarizeOverlaps, ScanBamParam, readGAlignmentPairs, readGAlignments

if(interactive()) {
	
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

cNAME <- "SRR039214" 						# specify control name

# output file directory
outDir <- paste(getwd(), "ripSeek_example")

# use biomart
txDbName <- "biomart"
biomart <- "ENSEMBL_MART_ENSEMBL"		# use archive to get ensembl 65
dataset <- "mmusculus_gene_ensembl"		
host <- "dec2011.archive.ensembl.org" 	# use ensembl 65 for annotation

resultlist <- rulebaseRIPSeek(bamFiles, "SRR039214", dataset=dataset, 
		txDbName=txDbName, biomart=biomart, host=host, by="tx")

}