mainSeekSingleChrom: Automatic bin size selection, bin count, and HMM parameters...

Description Usage Arguments Value Note Author(s) See Also Examples

Description

This an internal function used by mainSeek to accomplish three major tasks on a single chromosome: automatically select bin size, compute read counts within the bins, and obtain optimal HMM paramters.

Usage

1
2
3
4
mainSeekSingleChrom(alignGR, K = 2, binSize = NULL, minReadCount = 10, 
	backupNumBins = 10, minBinSize = 200, maxBinSize = 1200, 
	increment = 5, pathToSavePlotsOfBinSizesVersusCosts, 
	verbose = TRUE, allowSecondAttempt = TRUE, ...)

Arguments

alignGR

GRanges containing the alignments on a single chromosome .

K

Number of hidden states (Default: 2). By default, state 1 specifies the background and state 2 the RIP regions. The two states are recognized by the means for the two distributions (See nbh_em).

binSize

Size to use for binning the read counts across each chromosome. If NULL, optimal bin size within a range (default: minBinSize=200, maxBinSize=1200) will be automatically selected (See selectBinSize).

minReadCount

Minimum aligned read counts needed for HMM to converge (Default: 10). Note that HMM may not converge some times when majority of the read counts are zero even if some read count > 10. When that happens, a back-up function addDummyProb comes in to create a placeholder for the corresponding chromosome in GRangeList to maintain the data structure to preserve all information (successfully) obtained from other chromosomes.

backupNumBins

If read count is less than minReadCount, then use backupNumBins (Default: 10) to bin the chromosome.

minBinSize

Minimum bin size to start with the bin selection (See selectBinSize). Default to 200, common minimum band size selected in RIP or RNA-seq library construction.

maxBinSize

Maximum bin size to stop with the bin selection (See selectBinSize). Default: 1200.

increment

Step-wise increment in bin size selection (See selectBinSize). Default: 5.

pathToSavePlotsOfBinSizesVersusCosts

Directory used to save the diagnostic plots for bin size selection.

verbose

Binary indicator for disable (FALSE) or enable (TRUE) HMM training message from function nbh to output to the console.

allowSecondAttempt

In case HMM fails to converge due to malformed paramters in EM iteraction, re-iterating the HMM process each time with a different suboptimal bin size in attempt to succeed in some trial. If all yeild nothing, fall back up to addDummyProb to return the place holder for the chromosome.

...

Argumnets passed to nbh.

Value

nbhGR

GRanges object containing the optimized HMM parameters (and the Viterbi hidden state sequence) accompanied with the read count vector following the (automatic) binning scheme.

Note

Unless a highly customized workflow is needed, ripSeek is the high-level front-end main function that should be used in most cases.

Author(s)

Yue Li

See Also

ripSeek, mainSeek, nbh_em

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# Parameters setting
binSize <- 1e5							  # use a large fixed bin size for demo only
minBinSize <- NULL						# min bin size in automatic bin size selection
maxBinSize <- NULL						# max bin size in automatic bin size selection
multicore <- FALSE						# use multicore
strandType <- "-"							# set strand type to minus strand

# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

alignGal <- getAlignGal(bamFiles[1], reverseComplement=TRUE, genomeBuild="mm9")

alignGR <- as(alignGal, "GRanges")

alignGRList <- GRangesList(as.list(split(alignGR, seqnames(alignGR))))

################ run main function for HMM inference on a single chromosome ################
nbhGR <- mainSeekSingleChrom(alignGR=alignGRList$chrX, K = 2, binSize=binSize, 
			
			minBinSize = minBinSize, maxBinSize = maxBinSize)
			
nbhGR			

gorillayue/RIPSeeker documentation built on May 17, 2019, 7:59 a.m.