scoreMergedBins: Average log odd scores over bins being merged into a single...
In gorillayue/RIPSeeker: RIPSeeker: a statistical package for identifying protein-associated transcripts from RIP-seq experiments

Description Usage Arguments Details Value Note Author(s) See Also Examples

Sum, normalize the read counts, and average the logOdd score over the bins being merged into a single enriced region.

1	scoreMergedBins(findOverlapsHits, unmergedGRAll, mergedGRAll)

`findOverlapsHits`	Output from `findOverlaps` as two columns indices with the first column containing the indices for unmerged GRanges and the second column the indices of the matched merged GRanges.
`unmergedGRAll`	GRanges before merging.
`mergedGRAll`	GRanges after merging.

The consecutive RIP-bins predicted by the Viterbi function (See nbh_vit) are merged into a single RIP region. An aggregate RIPScore as the averaged RIPScores over the associated merged bins is assigned to each merged RIP region. In the RIPSeeker workflow, the averaged RIPScore then becomes the representative score for the region and subject to significance test carried out in seekRIP.

A merged GRanges each with scores including summed read count, averaged log odd scores, and FPK (fragment per kilobase of region length). The latter score represent a normalized read count.

This function is expected to be called only from logScoreWithoutControl and logScoreWithControl.

Yue Li

seekRIP, computeLogOdd logScoreWithControl, logScoreWithoutControl

if(interactive()) { # see example in seekRIP
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# Parameters setting
binSize <- 1e5					# use a large fixed bin size for demo only
multicore <- FALSE			# use multicore
strandType <- "-"				# set strand type to minus strand

################ run main function for HMM inference on all chromosomes ################
mainSeekOutputRIP <- mainSeek(bamFiles=
    grep(pattern="SRR039214", bamFiles, value=TRUE, invert=TRUE),
		binSize=binSize, strandType=strandType, 		
		reverseComplement=TRUE, genomeBuild="mm9",
		uniqueHit = TRUE, assignMultihits = TRUE, 
		rerunWithDisambiguatedMultihits = TRUE,				
		multicore=multicore, silentMain=FALSE, verbose=TRUE)
		

nbhGRRIP <- mainSeekOutputRIP$nbhGRList$chrX

logOddScore <- computeLogOdd(nbhGRRIP)

values(nbhGRRIP) <- cbind(as.data.frame(values(nbhGRRIP)), logOddScore)
	
enrichIdx <- which(values(nbhGRRIP)$viterbi_state == 2)

unmergedRIP <- nbhGRRIP[enrichIdx]	
	
mergedRIP <- reduce(unmergedRIP, min.gapwidth = median(width(unmergedRIP) ))
	
overlapIdx <- findOverlaps(mergedRIP, unmergedRIP)

# a list with query hits as names and subject hits as items
findOverlapsHits <- split(overlapIdx, queryHits(overlapIdx))

# get the score for the first merged region
x <- scoreMergedBins(findOverlapsHits[[1]], unmergedRIP, mergedRIP)

# get scores for all of the merged regions
mergedRIPList <- lapply(split(overlapIdx, queryHits(overlapIdx)),
			
			scoreMergedBins, unmergedRIP, mergedRIP)
	
names(mergedRIPList) <- NULL
	
mergedRIP <- do.call(c, mergedRIPList)

# logOddScore is the averaged logOddScore across merged bins
mergedRIP
}