Pi: Pi
In GenoPop: Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

View source: R/pop_genomics.R

Pi	R Documentation

Pi

Description

This function calculates the nucleotide diversity (Pi) for a sample in a VCF file as defined by Nei & Li, 1979 (https://doi.org/10.1073/pnas.76.10.5269). The formula used for this is equivalent to the one used in vcftools –window-pi (https://vcftools.sourceforge.net/man_latest.html). Handling missing alleles at one site is equivalent to Korunes & Samuk, 2021 ( https://doi.org/10.1111/1755-0998.13326). The function calculates the number of monomorphic sites using the sequence length and the number of variants in the VCF file. This assumes, that all sites not present in the VCF file are invariant sites, which will underestimate the metric, because of commonly done (and necessary) variant filtering. However, otherwise this calculation would only work with VCF files that include all monomorphic sites, which is quite unpractical for common use cases and will increase computational demands significantly. If you happen to know the number of filtered our sites vs the number of monomorphic sites, please use the number of monomorphic + the number of polymorphic (number of variants in your VCF) sites as the sequence length to get the most accurate estimation of the metric. (This does not work for the window mode of this function, which assumes the sequence length to be the window size.) For batch processing, it uses process_vcf_in_batches. For windowed analysis, it uses a similar approach tailored to process specific genomic windows (process_vcf_in_windows).

Usage

Pi(
  vcf_path,
  seq_length,
  batch_size = 10000,
  threads = 1,
  write_log = FALSE,
  logfile = "log.txt",
  window_size = NULL,
  skip_size = NULL,
  exclude_ind = NULL
)

Arguments

`vcf_path`	Path to the VCF file.
`seq_length`	Total length of the sequence in number of bases (used in batch mode only).
`batch_size`	The number of variants to be processed in each batch (used in batch mode only, default of 10,000 should be suitable for most use cases).
`threads`	Number of threads to use for parallel processing.
`write_log`	Logical, indicating whether to write progress logs.
`logfile`	Path to the log file where progress will be logged.
`window_size`	Size of the window for windowed analysis in base pairs (optional). When specified, `skip_size` must also be provided.
`skip_size`	Number of base pairs to skip between windows (optional). Used in conjunction with `window_size` for windowed analysis.
`exclude_ind`	Optional vector of individual IDs to exclude from the analysis. If provided, the function will remove these individuals from the genotype matrix before applying the custom function. Default is NULL, meaning no individuals are excluded.

Value

In batch mode (no window_size or skip_size provided): Nucleotide diversity (Pi) across the sequence. In window mode (window_size and skip_size provided): A data frame with columns 'Chromosome', 'Start', 'End', and 'Pi', representing the nucleotide diversity within each window.

Examples

vcf_file <- system.file("tests/testthat/sim.vcf.gz", package = "GenoPop")
index_file <- system.file("tests/testthat/sim.vcf.gz.tbi", package = "GenoPop")
total_sequence_length <- 999299  # Total length of the sequence in vcf
# Batch mode example
pi_value <- Pi(vcf_file, total_sequence_length)
# Window mode example
pi_windows <- Pi(vcf_file, seq_length = total_sequence_length,
                 window_size = 100000, skip_size = 50000)

GenoPop documentation built on April 3, 2025, 9:51 p.m.

GenoPop index

README.md GenoPop

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

GenoPop
Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

Pi: Pi
In GenoPop: Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

Pi

Description

Usage

Arguments

Value

Examples

Related to Pi in GenoPop...

R Package Documentation

Browse R Packages

We want your feedback!

GenoPop Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

Pi: Pi In GenoPop: Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

Pi

Description

Usage

Arguments

Value

Examples

Related to Pi in GenoPop...

R Package Documentation

Browse R Packages

We want your feedback!

GenoPop
Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

Pi: Pi
In GenoPop: Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files