| vcftobd | R Documentation |
Reads the DS (dosage) and GP (genotype probabilities) FORMAT fields from a bgzipped, tabix-indexed VCF file — as produced by imputation servers such as the Michigan Imputation Server — and writes a pair of Format 5 BinaryDosage files.
vcftobd(
vcffile,
bdose_file,
region = NULL,
snpidformat = 0L,
bdoptions = character(0)
)
vcffile |
Path to the bgzipped, tabix-indexed VCF file. |
bdose_file |
Path for the output .bdose file. The companion .bdi
metadata file is written to |
region |
Optional genomic region string in bcftools format
(e.g. |
snpidformat |
Integer controlling how SNP IDs are stored.
|
bdoptions |
Character vector specifying which per-SNP statistics to
store. Any combination of |
The .bdose file begins with a 4-byte magic number followed by one gzip-compressed block per SNP. Each block contains the DS values for all samples followed by the GP values, encoded as unsigned 16-bit integers (round(value * 10000); 0xffff = missing).
The .bdi file is an RDS-serialised R list of class "genetic-info"
with the following elements:
Path to the associated .bdose file.
Logical; always FALSE for VCF-sourced files.
data.frame with columns fid (empty) and
sid (sample IDs).
Logical; TRUE if all SNPs are on a single chromosome.
Numeric; resolved SNP ID format (see snpidformat
parameter).
data.frame with columns chromosome, location, snpid, reference, alternate.
Named list of per-SNP annotations requested via
bdoptions. Each element is a numeric vector of length equal to
the number of SNPs. Values are read from the VCF INFO column when
available for the first SNP (AF for aaf, MAF for maf, R2 for rsq);
otherwise they are calculated from the dosage values.
List of class "bdose-info" with format,
subformat, headersize, numgroups, and groups.
Integer vector of length 0 (unused in Format 5).
Numeric vector of byte offsets into .bdose, one per SNP.
NULL (invisibly)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.