filterData: Filter list object based on read depth and missing data and...
In TitanCNA: Subclonal copy number and LOH prediction from whole genome sequencing of tumours

Description Usage Arguments Details Value Author(s) References See Also Examples

Filters all vectors in list based on specified chromosome(s) of interest, minimum and maximum read depths, missing data, mappability score threshold

1
2
3

filterData(data ,chrs = NULL, minDepth = 10, maxDepth = 200, 
    positionList = NULL, map = NULL, mapThres = 0.9,
    centromeres = NULL, centromere.flankLength = 0)

`data`	data.table object that contains an arbitrary number of components. Should include ‘chr’, ‘tumDepth’. All vector elements must have the same number of rows where each row corresponds to information pertaining to a chromosomal position.
`chrs`	`character` or vector of `character` specifying the chromosomes to keep. Chromosomes not included in this `array` will be filtered out. Chromosome style must match the `genomeStyle` used when running `loadAlleleCounts`
`minDepth`	`Numeric integer` specifying the minimum tumour read depth to include. Positions >= `minDepth` are kept.
`maxDepth`	`Numeric integer` specifying the maximum tumour read depth to include. Positions <= `maxDepth` are kept.
`positionList`	`data.frame` with two columns: ‘chr’ and ‘posn’. `positionList` lists the chromosomal positions to use in the analysis. All positions not overlapping this list will be excluded. Use `NULL` to use all current positions in `data`.
`map`	`Numeric array` containing map scores corresponding to each position in `data`. Optional for filtering positions based on mappability scores.
`mapThres`	`Numeric float` specifying the mappability score threshold. Only applies if `map` is specified. `map` scores >= `mapThres` are kept.
`centromeres`	data.frame containing list of centromere regions. This should contain 3 columns: chr, start, and end. If this argument is used, then data at and flanking the centromeres will be removed.
`centromere.flankLength`	Integer indicating the length (in base pairs) to the left and to the right of the centromere designated for removal of data.

All vectors in the input data.table object, and map, must all have the same number of rows.

The same data.table object containing filtered components.

Gavin Ha <gavinha@gmail.com>

Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., Biele, J., Ding, J., Le, A., Rosner, J., Shumansky, K., Marra, M. A., Huntsman, D. G., McAlpine, J. N., Aparicio, S. A. J. R., and Shah, S. P. (2014). TITAN: Inference of copy number architectures in clonal cell populations from tumour whole genome sequence data. Genome Research, 24: 1881-1893. (PMID: 25060187)

loadAlleleCounts

infile <- system.file("extdata", "test_alleleCounts_chr2.txt", 
                      package = "TitanCNA")
tumWig <- system.file("extdata", "test_tum_chr2.wig", package = "TitanCNA")
normWig <- system.file("extdata", "test_norm_chr2.wig", package = "TitanCNA")
gc <- system.file("extdata", "gc_chr2.wig", package = "TitanCNA")
map <- system.file("extdata", "map_chr2.wig", package = "TitanCNA")

#### LOAD DATA ####
data <-  loadAlleleCounts(infile, genomeStyle = "NCBI")

#### GC AND MAPPABILITY CORRECTION ####
cnData <- correctReadDepth(tumWig, normWig, gc, map)


#### READ COPY NUMBER FROM HMMCOPY FILE ####
logR <- getPositionOverlap(data$chr, data$posn, cnData)
data$logR <- log(2^logR) #use natural logs

#### FILTER DATA FOR DEPTH, MAPPABILITY, NA, etc ####
filtereData <- filterData(data, as.character(1:24), minDepth = 10, 
				maxDepth = 200, map = NULL, mapThres=0.9)