FixedSites | R Documentation |
This function counts the number of sites fixed for the alternative allele ("1") in a VCF file.
It processes the file in two modes: the entire file at once or in specified windows across the genome.
For batch processing, it uses process_vcf_in_batches
. For windowed analysis, it uses a similar
approach but tailored to process specific genomic windows (process_vcf_in_windows
).
FixedSites(
vcf_path,
threads = 1,
write_log = FALSE,
logfile = "log.txt",
batch_size = 10000,
window_size = NULL,
skip_size = NULL,
exclude_ind = NULL
)
vcf_path |
Path to the VCF file. |
threads |
Number of threads to use for parallel processing. |
write_log |
Logical, indicating whether to write progress logs. |
logfile |
Path to the log file where progress will be logged. |
batch_size |
The number of variants to be processed in each batch (used in batch mode only, default of 10,000 should be suitable for most use cases). |
window_size |
Size of the window for windowed analysis in base pairs (optional).
When specified, |
skip_size |
Number of base pairs to skip between windows (optional).
Used in conjunction with |
exclude_ind |
Optional vector of individual IDs to exclude from the analysis. If provided, the function will remove these individuals from the genotype matrix before applying the custom function. Default is NULL, meaning no individuals are excluded. |
The function has two modes of operation:
Batch Mode: Processes the entire VCF file in batches to count the total number of fixed sites for the alternative allele. Suitable for a general overview of the entire dataset.
Window Mode: Processes the VCF file in windows of a specified size and skip distance. This mode is useful for identifying regions with high numbers of fixed sites, which could indicate selective sweeps or regions of low recombination.
In batch mode (no window_size or skip_size provided): A single integer representing the total number of fixed sites for the alternative allele across the entire VCF file. In window mode (window_size and skip_size provided): A data frame with columns 'Chromosome', 'Start', 'End', and 'FixedSites', representing the count of fixed sites within each window.
# Batch mode example
vcf_file <- system.file("tests/testthat/sim.vcf.gz", package = "GenoPop")
index_file <- system.file("tests/testthat/sim.vcf.gz.tbi", package = "GenoPop")
num_fixed_sites <- FixedSites(vcf_file)
# Window mode example
fixed_sites_df <- FixedSites(vcf_file, window_size = 100000, skip_size = 50000)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.