readVCF: Read SNP data in tabixed VCF format

Description Usage Arguments Details Value Examples

View source: R/readVCF.R

Description

This function reads tabixed VCF-files, as distributed from the 1000 Genomes project (human).

Usage

1
2
3
readVCF(filename, numcols, tid, frompos, topos,
        samplenames=NA, gffpath = FALSE, include.unknown=FALSE, approx=FALSE,
	out="", parallel=FALSE)

Arguments

filename

the corresponding tabixed VCF-file

numcols

number of SNPs that should be read in as a chunk

tid

which chromosome ? (character)

frompos

start of the region

topos

end of the region

samplenames

a vector of individuals

gffpath

the corresponding GFF file

include.unknown

includ positions with unknown/missing nucleotides

approx

see details !

out

a folder suffix where the temporary files should be saved

parallel

parallel computation using mclapply

Details

The readVCF function expects a tabixed VCF file with a diploid GT field.
In case of haploid data, the GT field has to be transformed to a pseudo-diploid
field (such as 0 -> 0|0). An alternative is to use readData(..., format="VCF"),
which can read non-tabixed haploid and any kind of polyploid VCFs directly.
When approx=TRUE, the algorithm will apply a logical OR to the GT-field:
(0|0=0,1|0=1,0|1=1,1|1=1). Note, this is an approximation for diploid data, which will
speed up calculations. In case of haploid data, approx should be switched to TRUE.
If approx=FALSE, the full diploid information will be considered.
The ff-package PopGenome uses to store the SNP information limits total data size to
individuals * (number of SNPs) <= .Machine$integer.max
In case of very large data sets, the bigmemory package will be used;
this will slow down calculations (e.g. this package have to be installed first !!!).
Use the function vcf_handle <-.Call("VCF_open", filename)
to open a VCF-file and .Call("VCF_getSampleNames",vcf_handle)
to get and define the individuals which should be considered in the analysis.
See also readData(..., format="VCF") !

Value

The function creates an object of class "GENOME"

———————————————————
The following slots will be filled in the "GENOME" object
———————————————————

Slot Description
1. n.sites total number of sites
2. n.biallelic.sites number of biallelic sites
3. region.data some detailed information about the data read
4. region.names names of regions

Examples

1
2
3
4
5
6
7
# GENOME.class <- readVCF("...\chr1.vcf.gz", 1000, "1", 1, 100000)
# GENOME.class
# GENOME.class@region.names
# GENOME.class <- neutrality.stats(GENOME.class,FAST=TRUE)
# show the result:
# get.sum.data(GENOME.class)
# GENOME.class@region.data

PopGenome documentation built on Feb. 1, 2020, 1:07 a.m.