haplotype: Function to load tumour allele counts from a text file or...

Description Usage Arguments Value Author(s) References See Also Examples

Description

Function to load in the allele counts from tumour sequencing data from a delimited text file or data.frame object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    loadHaplotypeAlleleCounts(inCounts, cnfile, fun = "sum", haplotypeBinSize = 1e5, 
      minSNPsInBin = 3, chrs = c(1:22, "X"), minNormQual = 200, 
      genomeStyle = "NCBI", sep = "\t", header = TRUE, seqinfo = NULL,
      mapWig = NULL, mapThres = 0.9, centromere = NULL, minDepth = 10, maxDepth = 1000)
    
    getHaplotypesFromVCF(vcfFile, chrs = c(1:22, "X"), build = "hg19", genomeStyle = "NCBI",
      filterFlags = c("PASS", "10X_RESCUED_MOLECULE_HIGH_DIVERSITY"), 
      minQUAL = 100, minDepth = 10, minVAF = 0.25, altCountField = "AD", 
      keepGenotypes = c("1|0", "0|1", "0/1"), snpDB = NULL)
      
    loadBXcountsFromBEDDir(bxDir, chrs = c(1:22, "X", "Y"), minReads = 2)

Arguments

inCounts

Path to text file or data.frame containing tumour allele count data. inCounts must be 6 columns: chromosome, position, reference base, reference read counts, non-reference base, non-reference read counts. ‘chromosome’ column can be in ‘NCBI’ or ‘UCSC’ genome style; only autosomes, sex chromosomes, and mitochondrial chromosome are included (e.g. 1-22,X,Y,MT). The reference and non-reference base columns can be any arbitrary character; it is not used by TitanCNA.

cnfile

Path to file containing GC-bias and maappability corrected molecule coverage for given bin size.

vcfFile

Path to phased variant VCF file from LongRanger 2.1. The file name must have the suffix *phase_variants.vcf.gz.

bxDir

Path to directory containing tumor bed files for each chromosome containing BX tags.

fun

The function (‘SNP’, ‘sum’, ‘mean’) to use to summarize within each user defined bin using haplotypeBinSize and haplotype block defined by the phaseSet ID from thte 9th column of inCounts. ‘SNP’ - uses the phased allele counts each individual SNP; phased allele for the higher coverage (determined within each bin) haplotype is chosen. ‘sum’ - uses the read count sum across all phased SNPs for the higher coverage haplotype within a bin normalized by the total depth across all SNPs in a bin; each SNP in the bin is assigned this fraction. ‘mean’ - uses the mean (rounded) read count across all phased SNPs for the higher coverage haplotype within a bin normalized by the mean (rounded) depth across all SNPs in a bin; each SNP in the bin is assigned this rounded count and depth.

haplotypeBinSize

Bin size used to summarize SNPs based on phased haplotypes. See fun for the summarization approaches within a bin.

minSNPsInBin

The minimum number of SNPs required in each haplotypeBinSize for analysis. See fun for the summarization approaches within a bin.

chrs

Vector containing list of chromosomes to include in output.

minNormQual

Quality threshold to use for filtering; SNPs with lower than this value are excluded. This quality is any metric that provides the confidence of the locus being a true germline heterozygous SNP.

minReads

Minimum number of reads per barcode.

genomeStyle

The genome style to use for chromosomes. Use one of ‘NCBI’ or ‘UCSC’. It does not matter what style is found in inCounts, genomeStyle will be the style returned. Invokes setGenomeStyle.

build

Human genome reference build. Default: hg19.

snpDB

Path to SNP VCF file to use for specifying sites to retain.

minQUAL

Variants with quality (QUAL field) greater or equal to this value will be retained.

minDepth

Variants with read depth greater than or equal to this value will be retained.

maxDepth

Variants with read depth lower than or equal to this value will be retained.

minVAF

Variants with a variant/reference allele fraction of greater than or equal to this value will be retained.

altCountField

Specify the alternate count field name. Defaulat is "AD".

keepGenotypes

Genotypes to retain. Default is to keep these genotypes strings: 1|0, 0|1, 0/1

filterFlags

Specify the FILTER flags to retain.

sep

Character indicating the delimiter used for the columns for infile. Default is tab-delimited, "\t".

header

logical to indicate if the input tumour counts file contains a header line.

seqinfo

Seqinfo-class object describing chromosome information. If NULL, then will load seqinfo for hg19 system.files('extdata', 'Seqinfo_hg19.rda', package='TitanCNA'.

mapWig

Mappability score WIG file for binned data.

mapThres

Minimum mappability score of region/sequence overlapping variants to retain.

centromere

File containing reference genome gap file representing centromere locations. Usually obtained from UCSC.

Value

loadHaplotypeAlleleCounts returns a data.table containing components for

chr

Chromosome; character, genomeStyle naming convention

posn

Position; integer

phaseSet

Phase block identifier, numeric or character

refOriginal

Reference allele read count at SNP; numeric

tumDepthOriginal

Coverage at SNP; numeric

ref

Phased allele count values of higher coverage haplotype based on approach used (SNP, sum, mean); numeric

nonRef

Phased allele count values of lower coverage haplotype; tumDepth minus ref; numeric

tumDepth

Mean or sum of SNP read coverage; numeric

HapltypeRatio

Sum of read coverage of phased alleles of higher coverage haplotype normalized by tumDepth; numeric

haplotypeCount

Phased allele read count; numeric

getHaplotypesFromVCF returns a list containing 2 components

vcf.filtered

VCF object containing the list of heterozygous variants after filtering.

geno.gr

GRanges object containing the genotype information of the VCF

Author(s)

Gavin Ha <gavinha@gmail.com>

References

Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., Biele, J., Ding, J., Le, A., Rosner, J., Shumansky, K., Marra, M. A., Huntsman, D. G., McAlpine, J. N., Aparicio, S. A. J. R., and Shah, S. P. (2014). TITAN: Inference of copy number architectures in clonal cell populations from tumour whole genome sequence data. Genome Research, 24: 1881-1893. (PMID: 25060187)

See Also

loadDefaultParameters, plotHaplotypeFraction

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
  ## Not run: 
  infile <- "test_alleleCounts_chr2_with_phaseInfo.txt"
  haplotypeBinSize <- 1e5
  phaseSummarizeFun <- "sum"
  ## will load seqinfo_hg19 provided by TitanCNA package
  data <- loadHaplotypeAlleleCounts(infile, fun = phaseSummarizeFun,
      haplotypeBinSize = haplotypeBinSize, minSNPsInBin = 3, 
      chrs = c(1:22, "X"), minNormQual = 200, 
      genomeStyle = "NCBI", seqinfo = NULL)
  
## End(Not run)
  
  ## Not run: 
  vcfFile <- "test.vcf"
  hap <- getHaplotypesFromVCF(vcfFile, chrs = c(1:22,"X"), build = "hg19",
    filterFlags = c("PASS", "10X_RESCUED_MOLECULE_HIGH_DIVERSITY"), 
    minQUAL = 100, minDepth = 10, minVAF = 0.25, 
    keepGenotypes = ("1|0", "0|1", "0/1"))
  
  
## End(Not run)

gavinha/TitanCNA documentation built on April 22, 2021, 9:38 a.m.