capture_diversity.VCF: Estimate Minimum Number of Individuals to Sample to Capture...

View source: R/capture_diversity.VCF.R

capture_diversity.VCFR Documentation

Estimate Minimum Number of Individuals to Sample to Capture Population Genomic Diversity (VCF)

Description

This function can be used to estimate the number of individuals to sample from a population in order to capture a desired percentage of the genomic diversity. VCF files can be either unzipped or gzipped. All samples must have the same ploidy and the VCF must contain GT information. This function was adapted from a previously developed Python method (Sandercock et al., 2024) (https://github.com/alex-sandercock/Capturing_genomic_diversity/)

Usage

capture_diversity.VCF(
  vcf,
  ploidy,
  r2_threshold = 0.9,
  iterations = 10,
  sample_list = NULL,
  parallel = FALSE,
  batch = 1,
  save.result = TRUE,
  verbose = TRUE
)

Arguments

vcf

Path to VCF file (.vcf or .vcf.gz) with genotype information

ploidy

The ploidy of the species being analyzed

r2_threshold

The ratio of diversity to capture (default = 0.9)

iterations

The number of iterations to perform to estimate the average result (default = 10)

sample_list

The list of samples to subset from the dataset (optional)

parallel

Run the analysis in parallel (True/False) (default = FALSE)

batch

The number of samples to draw in each bootstrap sample iteration (default = 1)

save.result

Save the results to a .txt file? (default = TRUE)

verbose

Print out the results to the console (default = TRUE)

Value

A data.frame with minimum number of samples required to match or exceed the input ratio

References

A.M. Sandercock, J.W. Westbrook, Q. Zhang, & J.A. Holliday, A genome-guided strategy for climate resilience in American chestnut restoration populations, Proc. Natl. Acad. Sci. U.S.A. 121 (30) e2403505121, https://doi.org/10.1073/pnas.2403505121 (2024).

Examples

#Example with a diploid vcf

# Example vcf
vcf_file <- system.file("diploid_example.vcf.gz", package = "castgen")

#Estimate the number of samples required to capture 95% of the population's genomic diversity
result <- capture_diversity.VCF(vcf_file,
                                 ploidy = 2,
                                 r2_threshold = 0.95,
                                 iterations = 10,
                                 save.result = FALSE,
                                 parallel=FALSE,
                                 verbose=FALSE)

#View results
print(result)


castgen documentation built on April 3, 2025, 9:21 p.m.