heuristicSeg: A (fast) heuristic method for creation of a genome segment...
In segmentSeq: Methods for identifying small RNA loci from high-throughput sequencing data

Description Usage Arguments Details Value Author(s) References See Also Examples

This method identifies by heuristic methods a set of loci from a segData or segMeth object. It does this by identifying within replicate groups regions of the genome that satisfy the criteria for being a locus and have no region within them that satisfies the criteria for being a null. These criteria can be defined by the user or inferred from the data.

heuristicSeg(sD, aD, gap = 50, RKPM = 1000, prop, coverage = 1, locCutoff =
0.9, nullCutoff = 0.9, subRegion = NULL, largeness = 1e8, getLikes =
TRUE, verbose = TRUE, tempDir = NULL, cl = NULL, recoverFromTemp =
FALSE, trimMeth = FALSE)

`aD`	An `alignmentData` or `methData` object.
`sD`	A `segData` or `segMeth` object derived from the ‘aD’ object.

gap

What is the minimum length of a null region?

RKPM

For analysis of a segData object, what RKPM (reads per kilobase per million reads) distinguishes between a locus and a null region?

`prop`	For analysis of a `segMeth` object, what proportion of methylated cytosines distinguishes between a locus and a null region? (see Details).
`coverage`	For analysis of a `segMeth` object, what is the minimum coverage required to make inferences on the presence/absense of a methylation locus?
`locCutoff`	For analysis of a `segMeth` object, with what likelihood must the proportion of methylated cytosines exceed the ‘prop’ option to define a locus? Defaults to 0.9.
`nullCutoff`	For analysis of a `segMeth` object, with what likelihood must the proportion of methylated cytosines be less than the ‘prop’ option to define a null region? Defaults to 0.9.
`subRegion`	A `'data.frame'` object defining the subregions of the genome to be segmented. If NULL (default), the whole genome is segmented.
`largeness`	The maximum size for a split analysis.
`getLikes`	Should posterior likelihoods for the new segmented genome (loci and nulls) be assessed?
`verbose`	Should the function be verbose? Defaults to TRUE.
`tempDir`	A directory for storing temporary files produced during the segmentation.
`cl`	A SNOW cluster object, or NULL. Defaults to NULL. See Details.
`recoverFromTemp`	If TRUE, will attempt to recover the position saved in 'tempDir'. Defaults to FALSE. See Details.
`trimMeth`	Should putative methylation regions be trimmed? Defaults to FALSE; see Details.

A 'cluster' object (package: snow) may be used for parallelisation of parts of this function when examining large data sets. Passing NULL to this variable will cause the function to run in non-parallel mode.

If recoverFromTemp = TRUE, the function will attempt to recover a crashed position from the temporary files in tempDir. At present, the function assumes you know what you are doing, and will perform no checking that these files are suitable for the specified recovery. Use with caution.

The prop variable can be used to set the proportion of methylation required to identify a locus by giving a numerical value between 0-1. It can also be determined automatically (see thresholdFinder).

Due to the way that methylation loci are identified, it is possible that the cytosines at the edges of methylation loci have limited evidence for methylation. The 'trimMeth' option trims cytosines at the edge of predicted methylation loci that have less than 50% likelihood of being above the required threshold.

A lociData object, containing count information on all the segments discovered.

Thomas J. Hardcastle

Hardcastle T.J., Kelly, K.A. and Balcombe D.C. (2011). Identifying small RNA loci from high-throughput sequencing data. In press.

classifySeg, an alternative approach to this problem using an empirical Bayes approach to classify segments. thresholdFinder, a function for determining a suitable threshold on methylation by examining the data. plotGenome, a function for plotting the alignment of tags to the genome (together with the segments defined by this function). baySeq, a package for discovering differential expression in lociData objects.

# Define the files containing sample information.

datadir <- system.file("extdata", package = "segmentSeq")
libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")

# Establish the library names and replicate structure.

libnames <- c("SL9", "SL10", "SL26", "SL32")
replicates <- c(1,1,2,2)

# Process the files to produce an `alignmentData' object.

alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
replicates, libnames = libnames, gap = 100)

# Process the alignmentData object to produce a `segData' object.

sD <- processAD(alignData, gap = 100, cl = NULL)

# Use the segData object to produce a segmentation of the genome.

segD <- heuristicSeg(sD = sD, aD = alignData, prop = 0.2,
subRegion = data.frame(chr = ">Chr1", start = 1, end = 1e5),
cl = NULL)