read_vcf_multisamps_cpp: Read VCF using CPP reader

View source: R/RcppExports.R

read_vcf_multisamps_cppR Documentation

Read VCF using CPP reader

Description

For each VCF record the information in the INFO field is used in priority. If missing, information is guessed from the REF/ALT sequences. If multiple alleles are defined in ALT, they are split and the allele count extracted from the GT field.

Usage

read_vcf_multisamps_cpp(
  filename,
  use_gz,
  min_sv_size = 10L,
  shorten_ref = TRUE,
  shorten_alt = TRUE,
  check_inv = FALSE
)

Arguments

filename

the path to the VCF file (unzipped or gzipped).

use_gz

is the VCF file gzipped?

min_sv_size

minimum variant size to keep in bp. Variants shorter than this will be skipped. Default is 10.

shorten_ref

should the REF sequence be shortened to the first 10 bp. Default is TRUE

shorten_alt

should the ALT sequence be shortened to the first 10 bp. Default is TRUE

check_inv

guess if a variant is an inversion by aligning REF with the reverse complement of ALT. If >80% similar (and REF and ALT>10bp), variant is classified as INV.

Details

Alleles are split and, for each, the allele count is computed across samples.

Value

data.frame with variant and genotype information

Author(s)

Jean Monlong


jmonlong/sveval documentation built on July 31, 2023, 7:50 p.m.