validObjects: Functions to check for appropriate data structures

Description Usage Interval objects Count objects Call objects

Description

For efficiency, the mcCNV package does not currently formalize data structures using the S4 system. Rather, the mcCNV package utilizes data.table objects, requiring specific fields.

Usage

1
2
3
4
5

Interval objects

Interval objects define the intervals for computing copy number; inspired by the GenomicRanges package, they require 'seqnames', 'start', and 'end'. The start and end fields must be integers and all end values must be greater than all start values. Finally, 'seqnames' cannot contain the semicolon (';'), colon (';'), or dash ('-') characters. These characters are used in subsequent processing steps to define and combine intervals, e.g. "seq:1-10;seq:15-25" would represent a combined interval consisting of positions 1-10 and 15-25 on 'seq'.

Converting a GRanges object with as.data.table will create a valid interval object.

Of note, the mcCNV package uses 1-based positions (the standard for R programming) for both the start and end positions. fread can typically load BED files directly, but the BED file specification uses 0-based start and 1-based end positions. Users are responsible for ensuring the start and end positions are converted appropriately, e.g. interval[ , start := start + 1].

Count objects

Count objects list the molecule counts over the given intervals. Count objects require the same fields and specifications as interval objects (listed above). Additionally, count objects have integer fields 'molCount' giving the number of overlapping molecules & 'nCoverMult' giving the number of the overlapping molecules that overlapped more than one interval. Finally, count objects have a 'subject' field giving the subject represented. Count objects are, by default, long and can be combined using the data.table convention, rbindlist. Count objects are created by cnvGetCounts.

We provide the cnvGatherCounts convenience function for reading saved count objects, and combining them into a single multiple-subject count object required by cnvCallCN.

Call objects

Call objects list the estimated copy number for each interval.


daynefiler/mcCNV documentation built on Dec. 15, 2021, 3:58 a.m.