High-speed, high-specialisation population-scale whole-genome variation and sequence data access


WhopGenome provides read access to Variant Call Format files with maximum speed by means of C functions with many specialised output formats and a configurable filtering engine. Allows indexing of FASTA files and any file format using tab-separated columns, such as GFF, VCF and METAL, in preparation to high-speed access. Can read specified subsections of indexed FASTA files very fast. It also provides many easy-to-use methods to access the UCSC Genome Browser SQL servers, the AmiGO gene ontology databases, PLINK .PED files and Bioconductor's organism annotation databases.


Package: WhopGenome
Type: Package
Version: 1.0
Date: 2013-01-24
License: GPL-2

- Open a VCF file with handle <- vcf_open("filename") - Set a region of interest (chromosome/contig ID,start position, end position) with vcf_setregion(handle,"X",200000, 300000 ) - Select (in this case the first 10) samples of interest: vcf_selectsamples( handle, vcf_getSampleNames(handle)[1:10] ) - Read from the file via resvec <- vcf_readLineVec(handle)


Ulrich Wittelsbuerger ulrich.wittelsbuerger@uni-duesseldorf.de


The 1000 Genomes Project http://1000genomes.org/

The 1000 Genomes Project Consortium (2010), A map of human genome variation from population-scale sequencing. Nature *467*, 1061-1073.

Heng Li (2011), Tabix: Fast retrieval of sequence features from generic TAB-delimited files, Bioinformatics, doi: 10.1093/bioinformatics/btq671

The Variant Call Format http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41


#vcfh <- .Call("VCF_open","/data/vcf/1000g/ALL.Chromosome1.consensus.vcf.gz",PACKAGE="WhopGenome")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.

comments powered by Disqus