Description Usage Arguments Details Value Examples
This function reads tabixed VCF-files, as distributed from the 1000 Genomes project (human).
1 2 3 |
filename |
the corresponding tabixed VCF-file |
numcols |
number of SNPs that should be read in as a chunk |
tid |
which chromosome ? (character) |
frompos |
start of the region |
topos |
end of the region |
samplenames |
a vector of individuals |
gffpath |
the corresponding GFF file |
include.unknown |
includ positions with unknown/missing nucleotides |
approx |
see details ! |
out |
a folder suffix where the temporary files should be saved |
parallel |
parallel computation using mclapply |
The readVCF function expects a tabixed VCF file with a diploid GT field.
In case of haploid data, the GT field has to be transformed to a pseudo-diploid
field (such as 0 -> 0|0). An alternative is to use readData(..., format="VCF"),
which can read non-tabixed haploid and any kind of polyploid VCFs directly.
When approx=TRUE
, the algorithm will apply a logical OR to the GT-field:
(0|0=0,1|0=1,0|1=1,1|1=1). Note, this is an approximation for diploid data, which will
speed up calculations. In case of haploid data, approx
should be switched to TRUE
.
If approx=FALSE
, the full diploid information will be considered.
The ff-package PopGenome uses to store the SNP information limits total data size to
individuals * (number of SNPs) <= .Machine$integer.max
In case of very large data sets, the bigmemory package will be used;
this will slow down calculations (e.g. this package have to be installed first !!!).
Use the function vcf_handle <-.Call("VCF_open", filename)
to open a VCF-file and .Call("VCF_getSampleNames",vcf_handle)
to get and define the individuals which should be considered in the analysis.
See also readData(..., format="VCF") !
The function creates an object of class "GENOME"
———————————————————
The following slots will be filled in the "GENOME"
object
———————————————————
Slot | Description | |
1. | n.sites | total number of sites |
2. | n.biallelic.sites | number of biallelic sites |
3. | region.data | some detailed information about the data read |
4. | region.names | names of regions |
1 2 3 4 5 6 7 | # GENOME.class <- readVCF("...\chr1.vcf.gz", 1000, "1", 1, 100000)
# GENOME.class
# GENOME.class@region.names
# GENOME.class <- neutrality.stats(GENOME.class,FAST=TRUE)
# show the result:
# get.sum.data(GENOME.class)
# GENOME.class@region.data
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.