seekRIP: Identify significant peaks

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/seekRIP.R

Description

Based on the posteriors derived from HMM by mainSeek, find the significant RIP regions derived from merging the adjacent RIP bins. The significance test makes use of the log odd ratio of the posteriors for RIP over the background states.

Usage

1
2
3
seekRIP(nbhGRRIP, nbhGRCTL = NULL, padjMethod = "BH", 
	logOddCutoff = -Inf, pvalCutoff = 1, pvalAdjCutoff = 1, 
	eFDRCutoff = 1)

Arguments

nbhGRRIP

GRanges object for the RIP library created from mainSeek containing the posterior probabilities of the hidden states for each observed read count.

nbhGRCTL

An optional argument as a GRanges object for the control library created from mainSeek containing the posterior probabilities of the hidden states for each observed read count.

padjMethod

Method used to adjust multiple testing performed in p.adjust (Default: "BH").

logOddCutoff

Threshold for the log odd ratio of posterior for the RIP over the background states (See seekRIP). Only peaks with logOdd score greater than the logOddCutoff will be reported. Default: -Inf (i.e. no cutoff).

pvalCutoff

Threshold for the adjusted p-value for the logOdd score. Only peaks with adjusted p-value less than the pvalAdjCutoff will be reported. Default: 1 (i.e. no cutoff).

pvalAdjCutoff

Threshold for the adjusted p-value for the logOdd score. Only peaks with adjusted p-value less than the pvalAdjCutoff will be reported. Default: 1 (i.e. no cutoff).

eFDRCutoff

Threshold for the empirical false discovery rate (eFDR). Only peaks with eFDR less than the pvalAdjCutoff will be reported. Default: 1 (i.e. no cutoff).

Details

The RIPScore is compupted in computeLogOdd as the log odd ratio of the posterior for the RIP state (z_{i} = 2) over the posterior for the background state (z_{i} = 1) in RIP library. When control is available, the RIPScore is updated by the difference between the RIPScores between RIP and control. The adjacent bins with hidden states predicted by nbh_vit as the enriched state (corresponding to the NB with larger mean) are merged. The RIPSscores are averaged over the merged bins. To assess the statistical significance of the RIPScore for each region, we assume that the RIPScore follows a Gaussian (Normal) distribution with mean and standard deviation estimated using the RIPScores over all of the bins. The rationale is based on the assumption that most of the RIPScores correspond to the background state and together contribute to a stable estimate of the test statistics (TS) and p-value computed using the R built-in function pnorm.

The p-value is adjusted by p.adjust with BH method by default. The same procedure is applied optionally to the control library. Only when the control is available, is an empirical false discovery rate (eFDR) estimated based on the idea of "sample swap" inspired by MACS (a ChIP-seq algorithm from Zhange el al. (2008). At each p-value, RIPSeeker finds the number of significnat RIP-regions over control (CTL) based on pvalRIP and the number of significant control regions over RIP based on pvalCTL. The eFDR is defined as the ratio of the number of "RIP" (false positive) regions identified from CTL-RIP comparison over the number of RIP regions from the RIP-CTL comparison. The maximum value for eFDR is 1 and minimum value for eFDR is max(p-value, 0). The former takes care of the case where the numerator is bigger than the denominator, and the latter for zero numerator.

Value

GRanges object containing the merged bins with values slot saved for RIPScore (lodOdd), p-value (pval), adjusted p-value (pvalAdj) for RIP and optionally for control.

Note

Internal function used by ripSeek.

Author(s)

Yue Li

References

Yong Zhang, Tao Liu, Clifford A Meyer, J\'er\^ome Eeckhoute, David S Johnson, Bradley E Bernstein, Chad Nusbaum, Richard M Myers, Myles Brown, Wei Li, and X Shirley Liu. Model-based analysis of ChIP-Seq (MACS). Genome Biology, 9(9):R137, 2008.

See Also

logScoreWithControl, logScoreWithoutControl, empiricalFDR, computeLogOdd, scoreMergedBins, ripSeek

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
if(interactive()) {
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# Parameters setting
binSize <- 1e5        # use a large fixed bin size for demo only
multicore <- FALSE    # use multicore
strandType <- "-"			# set strand type to minus strand

################ run main function for HMM inference on all chromosomes ################
mainSeekOutputRIP <- mainSeek(bamFiles=grep(pattern="SRR039214", 
    bamFiles, value=TRUE, invert=TRUE),
		binSize=binSize, strandType=strandType, 		
		reverseComplement=TRUE, genomeBuild="mm9",
		uniqueHit = TRUE, assignMultihits = TRUE, 
		rerunWithDisambiguatedMultihits = FALSE,				
		multicore=multicore, silentMain=FALSE, verbose=TRUE)
		
mainSeekOutputCTL <- mainSeek(bamFiles=grep(pattern="SRR039214", 
    bamFiles, value=TRUE, invert=FALSE),
		binSize=binSize, strandType=strandType, 		
		reverseComplement=TRUE, genomeBuild="mm9",
		uniqueHit = TRUE, assignMultihits = TRUE, 
		rerunWithDisambiguatedMultihits = FALSE,				
		multicore=multicore, silentMain=FALSE, verbose=TRUE)

# with control
ripGR.wicontrol <- seekRIP(mainSeekOutputRIP$nbhGRList$chrX, mainSeekOutputCTL$nbhGRList)

# without control
ripGR.wocontrol <- seekRIP(mainSeekOutputRIP$nbhGRList$chrX)
}

RIPSeeker documentation built on Oct. 31, 2019, 7:29 a.m.