getCounts: Gets counts from alignment data from a set of genome...
In segmentSeq: Methods for identifying small RNA loci from high-throughput sequencing data

Description Usage Arguments Details Value Author(s) See Also Examples

A function for extracting count data from an alignmentData object given a set of segments defined on the genome.

1 2	getCounts(segments, aD, preFiltered = FALSE, adjustMultireads = TRUE, useChunk = FALSE, cl)

`segments`	A `GRanges` object which defines a set of segments for which counts are required.
`aD`	An `alignmentData` object.
`preFiltered`	The function internally cleans the data; however, this may not be needed and omitting these steps may save computational time. See Details.
`adjustMultireads`	If working with methylation data, this option toggles an adjustment for reads that align to multiple locations on the genome. Defaults to TRUE.
`useChunk`	If all segments are within defined ‘chunks’ of the alignmentData object, speed increases if this is set to TRUE. Otherwise, counts may be inaccurate. Defaults to FALSE.
`cl`	A SNOW cluster object, or NULL. See Details.

The function extracts count data from alignmentData object 'aD' given a set of segments. The non-trivial aspect of this function is that at a segment which contains a tag that matches to multiple places in that segment (and thus appears multiple times in the alignmentData object) should count it only once.

If preFiltered = FALSE then the function allows for missing (NA) data in the segments, unordered segments and duplicated segments. If the segment list has no missing data, is already ordered, and contains no duplications, then computational time can be saved by setting preFiltered = TRUE.

A cluster object (package: snow) is recommended for parallelisation of this function when using large data sets. Passing NULL to this variable will cause the function to run in non-parallel mode.

In general, this function will probably not be accessed by the user as the processAD function includes a call to getCounts as part of the standard processing of an alignmentData object into a segData object.

If ‘as.matrix’, a matrix, each column of which corresponds to a library in the alignmentData object ‘aD’ and each row to the segment defined by the corresponding row in ‘segments’. Otherwise an equivalent DataFrame object.

Thomas J. Hardcastle

processAD

# Define the files containing sample information.

datadir <- system.file("extdata", package = "segmentSeq")
libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")

# Establish the library names and replicate structure.

libnames <- c("SL9", "SL10", "SL26", "SL32")
replicates <- c(1,1,2,2)

# Process the files to produce an 'alignmentData' object.

alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
replicates, libnames = libnames, gap = 100)

# Get count data for three arbitrarily chosen segments on chromosome 1.

getCounts(segments = GRanges(seqnames = c(">Chr1"),
          IRanges(start = c(1,100,2000), end = c(40,3000,5000))), 
          aD = alignData, cl = NULL)