Heterozygosity | R Documentation |
This function calculates the rate of heterozygosity for samples in a VCF file. (The proportion of heterozygote genotypes.)
For batch processing, it uses process_vcf_in_batches
. For windowed analysis, it uses a similar
approach tailored to process specific genomic windows (process_vcf_in_windows
).
Heterozygosity(
vcf_path,
batch_size = 10000,
threads = 1,
write_log = FALSE,
logfile = "log.txt",
window_size = NULL,
skip_size = NULL,
exclude_ind = NULL
)
vcf_path |
Path to the VCF file. |
batch_size |
The number of variants to be processed in each batch (used in batch mode only, default of 10,000 should be suitable for most use cases). |
threads |
Number of threads to use for parallel processing. |
write_log |
Logical, indicating whether to write progress logs. |
logfile |
Path to the log file where progress will be logged. |
window_size |
Size of the window for windowed analysis in base pairs (optional).
When specified, |
skip_size |
Number of base pairs to skip between windows (optional).
Used in conjunction with |
exclude_ind |
Optional vector of individual IDs to exclude from the analysis. If provided, the function will remove these individuals from the genotype matrix before applying the custom function. Default is NULL, meaning no individuals are excluded. |
In batch mode (no window_size or skip_size provided): Observed heterozygosity rate averaged over all loci. In window mode (window_size and skip_size provided): A data frame with columns 'Chromosome', 'Start', 'End', and 'Ho', representing the observed heterozygosity rate within each window.
vcf_file <- system.file("tests/testthat/sim.vcf.gz", package = "GenoPop")
index_file <- system.file("tests/testthat/sim.vcf.gz.tbi", package = "GenoPop")
# Batch mode example
Ho <- Heterozygosity(vcf_file)
# Window mode example
Ho_windows <- Heterozygosity(vcf_file, window_size = 100000, skip_size = 50000)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.