read_vcf_cpp | R Documentation |
For each VCF record the information in the INFO field is used in priority. If missing, information is guessed from the REF/ALT sequences. If multiple alleles are defined in ALT, they are split and the allele count extracted from the GT field.
read_vcf_cpp(
filename,
use_gz,
sample_name = "",
min_sv_size = 10L,
shorten_ref = TRUE,
shorten_alt = TRUE,
gq_field = "GQ",
check_inv = FALSE,
keep_nocalls = FALSE,
other_fields = as.character(c())
)
filename |
the path to the VCF file (unzipped or gzipped). |
use_gz |
is the VCF file gzipped? |
sample_name |
which sample to process. If not found, uses first sample in VCF file. If "*", force no sample selection |
min_sv_size |
minimum variant size to keep in bp. Variants shorter than this will be skipped. Default is 10. |
shorten_ref |
should the REF sequence be shortened to the first 10 bp. Default is TRUE |
shorten_alt |
should the ALT sequence be shortened to the first 10 bp. Default is TRUE |
gq_field |
which field from FORMAT should be used as genotype quality. Default is "GQ". If not found, QUAL will be used |
check_inv |
guess if a variant is an inversion by aligning REF with the reverse complement of ALT. If >80% similar (and REF and ALT>10bp), variant is classified as INV. |
keep_nocalls |
should we keep variants/alleles with missing genotypes (e.g. "./."). Default is FALSE |
other_fields |
name of another field from INFO to extract. |
Alleles are split and, for each, column 'ac' reports the allele count. Notable cases incude 'ac=-1' for no/missing calls (e.g. './.'), and 'ac=0' on the first allele to report hom ref, variants. These cases are often filtered later with 'ac>0' to keep only non-ref calls. If the VCF contains no samples or if no sample selection if forced (sample_name='*'), 'ac' will contain '-1' for all variants in the VCF.
data.frame with variant and genotype information
Jean Monlong
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.