ReadVCF: Read VCF file

View source: R/ReadVCF.R

ReadVCFR Documentation

Read VCF file

Description

This function reads and format VCF files

Usage

ReadVCF(
  File,
  AlleleDepthField = NULL,
  AlleleDepthType = "alleles",
  MaxMarkerMissing = 0.2,
  MaxIndMissing = 0.2,
  NbThreads = 0L,
  Verbose = TRUE
)

Arguments

File

Path to VCF file

AlleleDepthField

Allele read depth FORMAT field in the VCF to get allele depth proportions. If NULL (default), dosage is obtained from GT field

AlleleDepthType

Allele read depth type: either the depths associated with the REF and ALT alleles (alleles, by default), or the depths associated with each haplotypes reported in the GT field (haplotypes). The latter can be used for a phased VCF generated by HaploCharmer.

MaxMarkerMissing

Maximum proportion of missing values for marker above which the marker is discarded

MaxIndMissing

Maximum proportion of missing values for individual above which the individual is discarded (applied after marker filtering)

NbThreads

Number of threads to be used (positive integer) with a default value of 0 setting automatically all threads available

Verbose

A boolean describing if detailed information should be printed

Details

The function ReadVCF() reads Variant Call Format (VCF) files.

By default, the dosages are obtained from the GT field of the VCF. However, genotypic data are commonly "diploidized" in polyploid species where all heterozygous classes are aggregated into one class. In this case, it is recommended to specify the allele depth field of the VCF to work with allele read depth ratios instead, e.g. AlleleDepthField=AD.

If the VCF has been generated by the workflow HaploCharmer where alleles are phased within blocks, the read depths are not associated to alleles but to haplotypes. In this case, read depth ratios associated to each allele can be obtained by specifying AlleleDepthType="haplotypes".

Filtering of missing data can be applied to individuals and markers by setting the maximum percentages MaxMarkerMissing and MaxIndMissing, respectively.

By default, all available threads/CPU cores are used but the number can be chosen using NbThreads.

Value

A list of three items: a filtered list of genotying matrices (Geno), a dataframe with variant information (MarkerInfo), and a dataframe to be used as a genetic map proxy for local admixture inference (GeneticMap). The genetic map proxy is based on physical positions assuming all chromosomes are 100cM long.

See Also

ReadHPA() to read Haplotype Presence-Absence files generated by HaploCharmer.

Examples

## Read test VCF
vcf_path <- system.file("extdata", "Test.vcf", package = "AdmixPoly")
DataVCF <- AdmixPoly::ReadVCF(File = vcf_path, NbThreads=1)

AdmixPoly documentation built on June 18, 2026, 1:06 a.m.