vcf_normalization: Normalize VCF file to bi-allelic variants

Description Usage Arguments Details Value

View source: R/vcf.R

Description

Converting VCF files to plink format has never been easier. However, there are a few issues related to some intrinsic limitations of the plink format. The first is related to the fact that variants in a plink file are bi-allelic only, while variants in a VCF file can be multi-allelic. The second is related to an intrinsic limitation of plink which makes indel definitions ambiguous. Here is an example: is the following variant an insertion or a deletion compared to the GRCh37 reference?

Usage

1
2
vcf_normalization(vcf.file, ref.file, output.file,
  bcftools.exec = "bcftools", num.threads)

Arguments

vcf.file

[string]
The input VCF file path.

ref.file

[string]
A human reference genome fasta file to normalize indels against.

output.file

[string]
The output VCF file path.

bcftools.exec

[string]
Path of bcftools executable.

num.threads

[int]
Number of CPUs usable by bcftools Default is determined by SLURM environment variables and at least 1.

Details

20 31022441 A AG

There is no way to tell, as the plink format does not record this information.

Keeping this in mind, we are going to split mulit-allelic variants into bi-allelic ones, left-normalize indels, and assign unique idetifiers.

Value

Captured system output as character vector.


imbs-hl/imbs documentation built on Sept. 6, 2019, 11:05 p.m.