Parsing and sorting of uncorrected read and sequence information files

Share:

Description

Loads WIG files for readcount, GC, and mappability data for non-overlapping windows of fixed length (i.e. bins), and returns a structure ready to used for readcount correction. See Details for specifics about file assumptions.

Usage

1
wigsToRangedData(readfile, gcfile, mapfile, verbose = FALSE)

Arguments

readfile

Pathname to WIG file containing readcounts per bin.

gcfile

Pathname to WIG file containing GC content per bin.

mapfile

Pathname to WIG file containing average mappability per bin.

verbose

Set to TRUE if messages are desired

Details

The number of lines in the three input files are expected to be identical, although the order and names of chromsomes in the file need not be identical. Chromosome lengths are required to be identical and unique, and if the latter is not true, the order of the chromosomes must then be identical.

At present, these three WIG files are expected to be generated by external programs, namely those from the HMMcopy suite (see See Also), rather than by existing R packages out of space and memory considerations when working with high coverage full genome samples.

Value

A RangedData object, where each row entry represents a bin, with the three values from the input as columns named reads, gc, and map.

Author(s)

Daniel Lai

References

correctedReadcount Suite

TBA

WIG

http://genome.ucsc.edu/goldenPath/help/wiggle.html

See Also

correctReadcount, to correct the readcounts in the resultant value.

Examples

1
2
3
4
5
rfile <- system.file("extdata", "tumour.wig", package = "HMMcopy")
gfile <- system.file("extdata", "gc.wig", package = "HMMcopy")
mfile <- system.file("extdata", "map.wig", package = "HMMcopy")

uncorrected_reads <- wigsToRangedData(rfile, gfile, mfile)