IterateHWE: Iteratively Estimate Population Parameters and Genotypes In a...

View source: R/pipelines.R

IterateHWER Documentation

Iteratively Estimate Population Parameters and Genotypes In a Diversity Panel

Description

These are wrapper function that iteratively run other polyRAD functions until allele frequencies stabilize to within a user-defined threshold. Genotype posterior probabilities can then be exported for downstream analysis.

Usage

IterateHWE(object, selfing.rate = 0, tol = 1e-05,
           excludeTaxa = GetBlankTaxa(object),
           overdispersion = 9)

IterateHWE_LD(object, selfing.rate = 0, tol = 1e-05, 
              excludeTaxa = GetBlankTaxa(object),
              LDdist = 1e4, minLDcorr = 0.2,
              overdispersion = 9)

IteratePopStruct(object, selfing.rate = 0, tol = 1e-03,
                 excludeTaxa = GetBlankTaxa(object),
                 nPcsInit = 10, minfreq = 0.0001,
                 overdispersion = 9, maxR2changeratio = 0.05)

IteratePopStructLD(object, selfing.rate = 0, tol = 1e-03,
                   excludeTaxa = GetBlankTaxa(object),
                   nPcsInit = 10, minfreq = 0.0001, LDdist = 1e4, 
                   minLDcorr = 0.2,
                   overdispersion = 9, maxR2changeratio = 0.05)

Arguments

object

A "RADdata" object.

selfing.rate

A number ranging from zero to one indicating the frequency of self-fertilization in the species. For individuals with odd ploidy (e.g. triploids), the selfing rate is always treated as zero and a warning is printed if a value above zero is provided.

tol

A number indicating when the iteration should end. It indicates the maximum mean difference in allele frequencies between iterations that is tolerated. Larger numbers will lead to fewer iterations.

excludeTaxa

A character vector indicating names of taxa that should be excluded from allele frequency estimates and chi-squared estimates.

nPcsInit

An integer indicating the number of principal component axes to initially estimate from object$depthRatio. Passed to AddPCA.

minfreq

A number indicating the minimum allele frequency allowed. Passed to AddAlleleFreqByTaxa.

LDdist

The distance, in basepairs, within which to search for alleles that may be in linkage disequilibrium with a given allele.

minLDcorr

The minimum correlation coefficient between two alleles for linkage disequilibrium between those alleles to be used by the pipeline for genotype estimation; see AddAlleleLinkages.

overdispersion

Overdispersion parameter; see AddGenotypeLikelihood.

maxR2changeratio

This number determines how many principal component axes are retained. The difference in R^2 values between the first and second axes is multiplied by maxR2changeratio. The last axis retained is the first axis after which the R^2 value changes by less than this value. Lower values of maxR2changeratio will result in more axes being retained.

Details

For IterateHWE, the following functions are run iteratively, assuming no population structure: AddAlleleFreqHWE, AddGenotypePriorProb_HWE, AddGenotypeLikelihood, AddPloidyChiSq, and AddGenotypePosteriorProb.

IterateHWE_LD runs each of the functions listed for IterateHWE once, then runs AddAlleleLinkages. It then runs AddAlleleFreqHWE, AddGenotypePriorProb_HWE, AddGenotypePriorProb_LD, AddGenotypeLikelihood, AddPloidyChiSq, and AddGenotypePosteriorProb iteratively until allele frequencies converge.

For IteratePopStruct, the following functions are run iteratively, modeling population structure: AddPCA, AddAlleleFreqByTaxa, AddAlleleFreqHWE, AddGenotypePriorProb_ByTaxa, AddGenotypeLikelihood, AddPloidyChiSq, and AddGenotypePosteriorProb. After the first PCA analysis, the number of principal component axes is not allowed to decrease, and can only increase by one from one round to the next, in order to help the algorithm converge.

IteratePopStructLD runs each of the functions listed for IteratePopStruct once, then runs AddAlleleLinkages. It then runs AddAlleleFreqHWE, AddGenotypePriorProb_ByTaxa, AddGenotypePriorProb_LD, AddGenotypeLikelihood, AddPloidyChiSq, AddGenotypePosteriorProb, AddPCA, and AddAlleleFreqByTaxa iteratively until convergence of allele frequencies.

Value

A "RADdata" object identical to that passed to the function, but with $alleleFreq, $priorProb, $depthSamplingPermutations, $genotypeLikelihood, $ploidyChiSq, and $posteriorProb slots added. For IteratePopStruct and IteratePopStructLD, $alleleFreqByTaxa and $PCA are also added. For IteratePopStructLD and IterateHWE_LD, $alleleLinkages and $priorProbLD are also added.

Note

If you see the error

Error in if (rel_ch < threshold & count > 5) { : missing value where TRUE/FALSE needed

try lowering nPcsInit.

Author(s)

Lindsay V. Clark

See Also

GetWeightedMeanGenotypes for outputting genotypes in a useful format after iteration is completed.

StripDown to remove memory-hogging slots that are no longer needed after the pipeline has been run.

PipelineMapping2Parents for mapping populations.

Examples

# load dataset
data(exampleRAD)

# iteratively estimate parameters
exampleRAD <- IterateHWE(exampleRAD)

# export results
GetWeightedMeanGenotypes(exampleRAD)

# re-load to run pipeline assuming population structure
data(exampleRAD)

# run pipeline
exampleRAD <- IteratePopStruct(exampleRAD, nPcsInit = 3)

# export results
GetWeightedMeanGenotypes(exampleRAD)

# dataset for LD pipeline
data(Msi01genes)

# run HWE + LD pipeline
mydata1 <- IterateHWE_LD(Msi01genes)

# run pop. struct + LD pipeline
# (tolerance raised to make example run faster)
mydata2 <- IteratePopStructLD(Msi01genes, tol = 0.01)

lvclark/polyRAD documentation built on Jan. 15, 2024, 4:19 a.m.