High-speed, high-specialisation population-scale whole-genome variation and sequence data access
WhopGenome provides read access to Variant Call Format files with maximum speed by means of C functions with many specialised output formats and a configurable filtering engine. Allows indexing of FASTA files and any file format using tab-separated columns, such as GFF, VCF and METAL, in preparation to high-speed access. Can read specified subsections of indexed FASTA files very fast. It also provides many easy-to-use methods to access the UCSC Genome Browser SQL servers, the AmiGO gene ontology databases, PLINK .PED files and Bioconductor's organism annotation databases.
- Open a VCF file with handle <- vcf_open("filename") - Set a region of interest (chromosome/contig ID,start position, end position) with vcf_setregion(handle,"X",200000, 300000 ) - Select (in this case the first 10) samples of interest: vcf_selectsamples( handle, vcf_getSampleNames(handle)[1:10] ) - Read from the file via resvec <- vcf_readLineVec(handle)
Ulrich Wittelsbuerger email@example.com
The 1000 Genomes Project http://1000genomes.org/
The 1000 Genomes Project Consortium (2010), A map of human genome variation from population-scale sequencing. Nature *467*, 1061-1073.
Heng Li (2011), Tabix: Fast retrieval of sequence features from generic TAB-delimited files, Bioinformatics, doi: 10.1093/bioinformatics/btq671
#vcfh <- .Call("VCF_open","/data/vcf/1000g/ALL.Chromosome1.consensus.vcf.gz",PACKAGE="WhopGenome")
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.