filterData: Filter list object based on read depth and missing data and...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/utils.R

Description

Filters all vectors in list based on specified chromosome(s) of interest, minimum and maximum read depths, missing data, mappability score threshold

Usage

1
2
3
filterData(data ,chrs = NULL, minDepth = 10, maxDepth = 200, 
    positionList = NULL, map = NULL, mapThres = 0.9,
    centromeres = NULL, centromere.flankLength = 0)

Arguments

data

data.table object that contains an arbitrary number of components. Should include ‘chr’, ‘tumDepth’. All vector elements must have the same number of rows where each row corresponds to information pertaining to a chromosomal position.

chrs

character or vector of character specifying the chromosomes to keep. Chromosomes not included in this array will be filtered out. Chromosome style must match the genomeStyle used when running loadAlleleCounts

minDepth

Numeric integer specifying the minimum tumour read depth to include. Positions >= minDepth are kept.

maxDepth

Numeric integer specifying the maximum tumour read depth to include. Positions <= maxDepth are kept.

positionList

data.frame with two columns: ‘chr’ and ‘posn’. positionList lists the chromosomal positions to use in the analysis. All positions not overlapping this list will be excluded. Use NULL to use all current positions in data.

map

Numeric array containing map scores corresponding to each position in data. Optional for filtering positions based on mappability scores.

mapThres

Numeric float specifying the mappability score threshold. Only applies if map is specified. map scores >= mapThres are kept.

centromeres

data.frame containing list of centromere regions. This should contain 3 columns: chr, start, and end. If this argument is used, then data at and flanking the centromeres will be removed.

centromere.flankLength

Integer indicating the length (in base pairs) to the left and to the right of the centromere designated for removal of data.

Details

All vectors in the input data.table object, and map, must all have the same number of rows.

Value

The same data.table object containing filtered components.

Author(s)

Gavin Ha <gavinha@gmail.com>

References

Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., Biele, J., Ding, J., Le, A., Rosner, J., Shumansky, K., Marra, M. A., Huntsman, D. G., McAlpine, J. N., Aparicio, S. A. J. R., and Shah, S. P. (2014). TITAN: Inference of copy number architectures in clonal cell populations from tumour whole genome sequence data. Genome Research, 24: 1881-1893. (PMID: 25060187)

See Also

loadAlleleCounts

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
infile <- system.file("extdata", "test_alleleCounts_chr2.txt", 
                      package = "TitanCNA")
tumWig <- system.file("extdata", "test_tum_chr2.wig", package = "TitanCNA")
normWig <- system.file("extdata", "test_norm_chr2.wig", package = "TitanCNA")
gc <- system.file("extdata", "gc_chr2.wig", package = "TitanCNA")
map <- system.file("extdata", "map_chr2.wig", package = "TitanCNA")

#### LOAD DATA ####
data <-  loadAlleleCounts(infile, genomeStyle = "NCBI")

#### GC AND MAPPABILITY CORRECTION ####
cnData <- correctReadDepth(tumWig, normWig, gc, map)


#### READ COPY NUMBER FROM HMMCOPY FILE ####
logR <- getPositionOverlap(data$chr, data$posn, cnData)
data$logR <- log(2^logR) #use natural logs

#### FILTER DATA FOR DEPTH, MAPPABILITY, NA, etc ####
filtereData <- filterData(data, as.character(1:24), minDepth = 10, 
				maxDepth = 200, map = NULL, mapThres=0.9)

TitanCNA documentation built on Nov. 8, 2020, 8:14 p.m.