readVCF: Read SNP data in tabixed VCF format
In PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses

Description Usage Arguments Details Value Examples

This function reads tabixed VCF-files, as distributed from the 1000 Genomes project (human).

1
2
3

readVCF(filename, numcols, tid, frompos, topos,
        samplenames=NA, gffpath = FALSE, include.unknown=FALSE, approx=FALSE,
	out="", parallel=FALSE)

`filename`	the corresponding tabixed VCF-file
`numcols`	number of SNPs that should be read in as a chunk
`tid`	which chromosome ? (character)
`frompos`	start of the region
`topos`	end of the region
`samplenames`	a vector of individuals
`gffpath`	the corresponding GFF file
`include.unknown`	includ positions with unknown/missing nucleotides
`approx`	see details !
`out`	a folder suffix where the temporary files should be saved
`parallel`	parallel computation using mclapply

The readVCF function expects a tabixed VCF file with a diploid GT field.
In case of haploid data, the GT field has to be transformed to a pseudo-diploid
field (such as 0 -> 0|0). An alternative is to use readData(..., format="VCF"),
which can read non-tabixed haploid and any kind of polyploid VCFs directly.
When approx=TRUE, the algorithm will apply a logical OR to the GT-field:
(0|0=0,1|0=1,0|1=1,1|1=1). Note, this is an approximation for diploid data, which will
speed up calculations. In case of haploid data, approx should be switched to TRUE.
If approx=FALSE, the full diploid information will be considered.
The ff-package PopGenome uses to store the SNP information limits total data size to
individuals * (number of SNPs) <= .Machine$integer.max
In case of very large data sets, the bigmemory package will be used;
this will slow down calculations (e.g. this package have to be installed first !!!).
Use the function vcf_handle <-.Call("VCF_open", filename)
to open a VCF-file and .Call("VCF_getSampleNames",vcf_handle)
to get and define the individuals which should be considered in the analysis.
See also readData(..., format="VCF") !

The function creates an object of class "GENOME"

———————————————————
The following slots will be filled in the "GENOME" object
———————————————————

	Slot	Description
1.	`n.sites`	total number of sites
2.	`n.biallelic.sites`	number of biallelic sites
3.	`region.data`	some detailed information about the data read
4.	`region.names`	names of regions

# GENOME.class <- readVCF("...\chr1.vcf.gz", 1000, "1", 1, 100000)
# GENOME.class
# GENOME.class@region.names
# GENOME.class <- neutrality.stats(GENOME.class,FAST=TRUE)
# show the result:
# get.sum.data(GENOME.class)
# GENOME.class@region.data