het_calc: Calculate heterozygosity from genotypes or allele frequencies

View source: R/het_calc.R

het_calcR Documentation

Calculate heterozygosity from genotypes or allele frequencies

Description

This function takes in a long format data table of genotypes or allele frequencies (allelic counts) and calculates heterozygosity. Heterozygosity is calculated as either the SNP-wise heterozygosity (at each SNP, excluding monomorphic sites), or as the genomic heterozygosity (the SNP-wise heterozygosity standardised for total sites assayed, including monomorphic sites). For indivdiual genotypes, you can use multiallelic data. For population allele frequencies, you must code as biallelic reference and alternate alleles.

Usage

het_calc(
  snpData,
  extraData,
  method,
  type,
  chromCol = "CHROM",
  posCol = "POS",
  locusCol = "LOCUS",
  sampCol = "SAMPLE",
  genoCol = "GT",
  popCol = "POP",
  roCol = "RO",
  aoCol = "AO",
  covCol = "COV.SITES",
  indsCol = "INDS"
)

Arguments

snpData

Data table: Genotypes for individuals or frquencies (read coiunts) for populations in long-format. See Details for parameterisation.

extraData

Data table: Extra info for individual or populations. See Details for parameterisation.

method

Charachter: One of 'genomic' or 'snpwise' to calculate genomic or SNP-wise heterozygosity, respectively.

type

Character: One of 'genos' or 'freqs', to calculate heterozygosity on genotype or frequencies, respectively.

chromCol

Character: The column with chromosome ID. Default is 'CHROM'.

posCol

Character: The column with the positional info. Default is 'POS'.

locusCol

Character: The column with locus ID. Default is 'LOCUS'.

sampCol

Character: The column name with the sampled individual ID. Default is 'SAMPLE'.

genoCol

Character: The column with the genotype info. Default is 'GT'. Genotypes should be scored as alleles separated by '/', e.g., '0/0', '0/1', '1/1', etc.

popCol

Character: The column name with population ID. Default is 'POP'.

roCol

Character: The column with reference allele counts. Default is 'RO'.

aoCol

Character: The column with alternate allele counts. Default is 'AO'.

covCol

Character: The column with the number of genomic sites covered per chromosome. Default is 'COV.SITES'.

indsCol

Character: The column name with the number of pooled individuals per population per chromosome. Default is 'INDS'.

Details

The genomic heterozygosity (also known as the autosomal heterozygosity) has been demonstrated to be the more accurate and robust measure of heterozygosity (Schmidt et al. 2021). SNP-wise heterozygosity can suffer from sampling biases (filtering, missing data, sample size, etc). Note, estimates of genomic heterozygosity are orders of magnitude less than SNP-wise heterozygosity (so do not be alarmed if you see very small values!).

Heterozygsity for population pools is calculated using the method from Ferretti et al. (2013). You can calculate genomic heterozygosity for population pools (standardising by total covered sites), or SNP-wise heterozygosity too (standardising by the number of polymorphic sites observed).

You must specify both type and method. This will dicate the required column needed for snpData and extraData.

If type=='genos' and method=='genomic':

  1. snpData requires columns specified in: chromCol, posCol, locusCol and sampCol

  2. extraData requires columns specified in: chromCol, sampCol, and covCol.

If type=='genos' and method=='snpwise'

  1. snpData requires columns specified in: chromCol, posCol, locusCol and sampCol

  2. extraData will NOT be used.

If type=='freqs' and method=='genomic':

  1. snpData requires columns specified in: chromCol, posCol, locusCol, popCol, roCol, and aoCol.

  2. extraData requires columns specified in: chromCol, sampCol, covCol, and indsCol.

If type=='freqs' and method=='snpwise':

  1. snpData requires columns specified in: chromCol, posCol, locusCol, popCol, roCol, and aoCol.

  2. extraData requires columns specified in: chromCol, sampCol, and indsCol.

Value

Returns a data table of heterozygosity estimates per sample or population. Genomic heterozygosity is reported per chromosome, SNP-wise heterozygosity is reported across all SNPs.

References

Ferretti et al. (2013) Molecular Ecology. DOI: 10.1111/mec.12522
Schmidt et al. (2021) Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.13659

Examples

library(genomalicious)

data(data_Genos)
data(data_PoolFreqs)

# Convert genos to characters
data_Genos[, GT:=genoscore_converter(GT)]

# Make extra data for the samples and populatin pools
extraSampInfo <- CJ(
  SAMPLE=unique(data_Genos$SAMPLE),
  CHROM=unique(data_Genos$CHROM),
  COV.SITES=150
  )

extraPoolInfo <- CJ(
  POP=unique(data_PoolFreqs$POP),
  CHROM=unique(data_PoolFreqs$CHROM),
  COV.SITES=150,
  INDS=30
  )

# Genomic heterozygosity of individuals, per chromosome/contig
het_calc(data_Genos, extraSampInfo, type='genos', method='genomic')

# SNP-wise heterozygosity of indivdiuals, SNP-wise
het_calc(data_Genos, extraSampInfo, type='genos', method='snpwise')

# Genomic heterozygosity of population pools, per chromosome/contig
het_calc(data_PoolFreqs, extraPoolInfo, type='freqs', method='genomic')

# Genomic heterozygosity of population pools, SNP-wise
het_calc(data_PoolFreqs, extraPoolInfo, type='freqs', method='snpwise')

j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.