ReadVCFDataChunk: User Constructor for class. Calls VCFData constructor:...

Description Usage Arguments Value Examples

Description

User Constructor for class. Calls VCFData constructor: ReadVCFDataChunk is a wrapper for readVcfAsVRanges. It removes indels, GL chromosomes, and MULTI calls. It scans the header of the vcf file and adds in the following fields for analysis if present: AD, GT, DP, GQ. Looks for the "END" tag in the header and reads in file as gVCF if necessary. This is a multi core version of readVCFData. Note, input file must have been zipped and have a corresponding tabix file. It will drop all hom ref sites not in the admixture file but retain the counts of homref and multi in the VCF file. This means that a few of the metrics and the hom ref plot can no longer be calculated in VCFQAReport. If the metrics can no longer be calculated, it will not be output. Please note that if using a filter on the data (eg gq.filter) this will not be applied to the hom ref and total number of calls. The filter is applied in the VCFQAReport step and the metrics number of hom ref and total number of calls is calculated while reading in the file. When calling this function keep in mind the memory requirements. For example, if numcores=6, then when submitting the job you may request 12 Gb each core (72 Gb total). However the VCF in memory will need to fit back onto a single core or else R will not be able to allocate the memory. The given example here does not make sense to run as it includes only chromosome 22.

Usage

1
ReadVCFDataChunk(mydir, myfile, genome, admixture.ref, numcores)

Arguments

mydir

Directory of vcf file

myfile

Filename of vcf file (zipped)

genome

GRCh37 or GRCh38

admixture.ref

VRanges with MAF for superpopulations (EAS, AFR, EUR)

numcores

Number of cores to read in VCF (passed to bplapply)

Value

Object of type VCFData

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
vcffn <- system.file("ext-data", "chr22.GRCh38.vcf.gz", package="genotypeeval")
mydir <- paste(dirname(vcffn), "/", sep="")
myfile <-basename(vcffn)
svp <- ScanVcfParam(which=GRanges("22", IRanges(0,1e5)), geno="GT")
vcf <- ReadVCFData(mydir, myfile, "GRCh38")
admix.var <- getVR(vcf)[getVR(vcf)$GT %in% c("0|1", "1|0", "1|1"),][,1:2]
admix.var$EAS_AF <- ifelse(admix.var$GT %in% c("1|1"), 1, .5)
admix.var$AFR_AF<- 0
admix.var$EUR_AF<- 0
admix.hom <- getVR(vcf)[getVR(vcf)$GT %in% c("0|0"),][,1:2]
admix.hom$EAS_AF<- 0
admix.hom$AFR_AF<- 1
admix.hom$EUR_AF<- 1
admix.ref <- c(admix.var, admix.hom)
ReadVCFDataChunk(mydir, myfile, "GRCh38", admix.ref, numcores=2)

jentom/genotypeeval documentation built on May 13, 2019, 12:54 p.m.