mainSeek: Train HMM paramters on each chromosome independently from the...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/mainSeek.R

Description

A back-end function used by the front-end function ripSeek to train HMM paramters on all of the chromosomes indepdently. This function in turn calls another function mainSeekSingleChrom to compute HMM paramters on each chromosome separately or in parallel (if multicore is TRUE).

Usage

1
2
3
4
5
6
mainSeek(bamFiles, reverseComplement = FALSE, 
	genomeBuild = "mm9", uniqueHit = TRUE, 
	assignMultihits = TRUE, strandType = NULL, 
  paired=FALSE, rerunWithDisambiguatedMultihits = TRUE,
 	silentMain = FALSE, multicore = TRUE, 
 	returnAllResults = TRUE, ...)

Arguments

bamFiles

A list of paths to individual BAM files. BED and SAM files are also accepted.

reverseComplement

Whether the reads came from the original or the opposite strand of the RNA being sequenced. If former, then reverseComplement should be FALSE; otherwise TRUE, in which case the strand signs will be switched from + to -, - to +, and * is unchanged.

genomeBuild

When the input alignment format is BED, genomeBuild is only required in getAlignGal to determine the chromosome lengths for the GAlignments obejct using function SeqinfoForUCSCGenome. BAM and SAM header have chromosome information, and thus genomeBuild is not needed.

uniqueHit

Binary indicator. If uniqueHit=TRUE, only reads mapped to single unique loci are used to train the HMM. Otherwise, all of the reads including multihits will be used for the HMM. A multihit is a read mapped to more than one loci. The flags for uniqueHits and multihits are the metadata values of GAlignments object constructed in getAlignGal.

assignMultihits

Binary indicator used by ripSeek to tell the function whether disambiguate multihits by assigning them to unique loci with the maximum posterior probability obtained from running HMM (See nbh_em)

strandType

A character variable indicate which strand the RIPSeeker needs to operate on. The options are NULL, '+', '-', '*'. If NULL or '*', then all of the reads will be used (preferable for non-strand specific sequencing). If '+' or '-', only reads from '+' or '-' strand will be used, respectively. Note that the sign is assumed to be THE SAME AS the strand sign of the processed alignment object and will be the opposite sign if reverseComplement is TRUE (See reverseComplement above).

paired

Binary to indicate whether the library is paired-end (TRUE) or single-end (FALSE by default) (see getAlignGal).

rerunWithDisambiguatedMultihits

After multihits have been asigned to unique loci, rerunWithDisambiguatedMultihits (Default: TRUE) indicates whether to re-run the HMM on the augmented read alignmnet data. If FALSE, the HMM step will not be re-run, and the workflow will proceed to RIP detection (See seekRIP) using the nondisambiguated alignments, which can either be the alignments containing only the uniqueHits (if uniqueHit=TRUE) or the alignments containing both the uniqueHits and multiHits (if uniqueHit=FALSE).

silentMain

Binary indicator to indicate whether to disable the verbose from the mainSeekSingleChrom function. If FALSE (by default), the EM training process will be output to the console for user to keep track of the training progress.

multicore

Binary indicator to indicate whether to use mclapplyfunction to compute HMM on chromosomes in parallel. The multicore function will speed up the computation by a factor proportional to the total number of CPU cores on the machine but may impose larger memory overhead than the singe-threading approach.

returnAllResults

Binary indicator to indicate whether to return all (HMM trained parameters, original, and disambiguated GAlignments) or just the HMM results.

...

Arguments passed to mainSeekSingleChrom.

Value

A list containing:

nbhGRList

GRangesList each item containig the HMM training results on a single chromosome.

alignGal

Original alignment data in GAlignments object

alignGalFiltered

Disambiguated alignmnet data with multihits assigned to unique loci.

Author(s)

Yue Li

See Also

ripSeek, mainSeekSingleChrom, mclapply

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# Parameters setting
binSize <- 1e5							  # use a large fixed bin size for demo only
minBinSize <- NULL						# min bin size in automatic bin size selection
maxBinSize <- NULL						# max bin size in automatic bin size selection
multicore <- FALSE						# use multicore
strandType <- "-"							# set strand type to minus strand

################ run mainSeekSingleChrom function for HMM inference on all chromosomes ################
mainSeekOut <- mainSeek(bamFiles=grep(pattern="SRR039214", 
    bamFiles, value=TRUE, invert=TRUE),
		binSize=binSize, minBinSize = minBinSize, 
		maxBinSize = maxBinSize, strandType=strandType, 		
		reverseComplement=TRUE, genomeBuild="mm9",
		uniqueHit = TRUE, assignMultihits = TRUE, 
		rerunWithDisambiguatedMultihits = FALSE,				
		multicore=multicore, silentMain=FALSE, verbose=TRUE)

RIPSeeker documentation built on Oct. 31, 2019, 7:29 a.m.