vcf2ploidy: Estimate ploidy directly from VCF files

Description Usage Arguments Value Examples

View source: R/vcf2ploidy.R

Description

Read in a VCf file, convert to HAD format, and estimate ploidy in one function

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
vcf2ploidy(
  filename,
  skip_lines = NULL,
  remove_double_hets = FALSE,
  props = c(0.25, 0.33, 0.5, 0.66, 0.75),
  mcmc.nchain = 2,
  mcmc.steps = 10000,
  mcmc.burnin = 1000,
  mcmc.thin = 2,
  train = FALSE,
  pl = NA,
  set = NA,
  nclasses = 2,
  pcs = 1:2
)

Arguments

filename

A character string of the data file path.

skip_lines

A numeric of the number of metadata lines to skip over in the VCF file. If left null, metadata lines are skipped over automatically by the count_metadata_lines function. The count_metadata_lines function requires reading in the entire file, so if you have a large file and know the number of metadata lines in that file, you can save some run time by entering the number of metadata lines in this argument.

remove_double_hets

Logical for determining if double heterozygous loci should be treated as missing information. Should fix issues with gbs2ploidy falsely labeling triploids.

props

a vector containing valid allelic proportions given the expected cyotypes present in the sample.

mcmc.nchain

number of chains for MCMC.

mcmc.steps

number of post burnin iterations for each chain.

mcmc.burnin

number of iterations to discard from each chain as a burnin.

mcmc.thin

thinning interval for MCMC.

train

a boolean specifying whether or not a training set with known ploidy should be used.

pl

a vector of known ploidies with one entry per individual (use ‘NA’ for individuals with unknown ploidy); only used if train == TRUE.

set

indixes for the training set; only used if train == TRUE.

nclasses

the number of cyotypes expected.

pcs

a vector giving the PC to use for DA.

Value

vcf2ploidy returns a list with three components:

pp A matrix with assignment probabilities for each individual (rows) to each group (columns); the first column gives the ids provided by the user. Only individuals that were not part of the training set are included.

pcwghts A matrix with the variable loadings (PC weights) from the ordination of residual heterozygosity and allelic proportions. Columns correspond with PCs in ascending order (i.e., the PC with the largest eigenvalue is first).

pcscrs A matrix of PC scores from the ordination of residual heterozygosity and allelic proportions. Columns correspond with PCs in ascending order (i.e., the PC with the largest eigenvalue is first).

Examples

1
2
## Not run: vcf2ploidy("./example.vcf")
## Not run: vcf2ploidy("./example.vcf", props=c(0.25, 0.5, 0.75))

dandewaters/VCF2Ploidy documentation built on Jan. 17, 2021, 2:12 p.m.