utils.outflank: OutFLANK: An Fst outlier approach by Mike Whitlock and Katie...

View source: R/utils.outflank.r

utils.outflankR Documentation

OutFLANK: An Fst outlier approach by Mike Whitlock and Katie Lotterhos, University of British Columbia.

Description

This function is the original implementation of Outflank by Whitlock and Lotterhos. dartR simply provides a convenient wrapper around their functions and an easier install being an r package (for information please refer to their github repository)

Usage

utils.outflank(
  FstDataFrame,
  LeftTrimFraction = 0.05,
  RightTrimFraction = 0.05,
  Hmin = 0.1,
  NumberOfSamples,
  qthreshold = 0.05
)

Arguments

FstDataFrame

A data frame that includes a row for each locus, with columns as follows:

  • $LocusName: a character string that uniquely names each locus.

  • $FST: Fst calculated for this locus. (Kept here to report the unbased Fst of the results)

  • $T1: The numerator of the estimator for Fst (necessary, with $T2, to calculate mean Fst)

  • $T2: The denominator of the estimator of Fst

  • $FSTNoCorr: Fst calculated for this locus without sample size correction. (Used to find outliers)

  • $T1NoCorr: The numerator of the estimator for Fst without sample size correction (necessary, with $T2, to calculate mean Fst)

  • $T2NoCorr: The denominator of the estimator of Fst without sample size correction

  • $He: The heterozygosity of the locus (used to screen out low heterozygosity loci that have a different distribution)

LeftTrimFraction

The proportion of loci that are trimmed from the lower end of the range of Fst before the likelihood funciton is applied [default 0.05].

RightTrimFraction

The proportion of loci that are trimmed from the upper end of the range of Fst before the likelihood funciton is applied [default 0.05].

Hmin

The minimum heterozygosity required before including calculations from a locus [default 0.1].

NumberOfSamples

The number of spatial locations included in the data set.

qthreshold

The desired false discovery rate threshold for calculating q-values [default 0.05].

Details

This method looks for Fst outliers from a list of Fst's for different loci. It assumes that each locus has been genotyped in all populations with approximately equal coverage.

OutFLANK estimates the distribution of Fst based on a trimmed sample of Fst's. It assumes that the majority of loci in the center of the distribution are neutral and infers the shape of the distribution of neutral Fst using a trimmed set of loci. Loci with the highest and lowest Fst's are trimmed from the data set before this inference, and the distribution of Fst df/(mean Fst) is assumed to'follow a chi-square distribution. Based on this inferred distribution, each locus is given a q-value based on its quantile in the inferred null'distribution.

The main procedure is called OutFLANK – see comments in that function immediately below for input and output formats. The other functions here are necessary and must be uploaded, but are not necessarily needed by the user directly.

Steps:

Value

The function returns a list with seven elements:

  • FSTbar: the mean FST inferred from loci not marked as outliers

  • FSTNoCorrbar: the mean FST (not corrected for sample size -gives an upwardly biased estimate of FST)

  • dfInferred: the inferred number of degrees of freedom for the chi-square distribution of neutral FST

  • numberLowFstOutliers: Number of loci flagged as having a significantly low FST (not reliable)

  • numberHighFstOutliers: Number of loci identified as having significantly high FST

  • results: a data frame with a row for each locus. This data frame includes all the original columns in the data set, and six new ones:

    • $indexOrder (the original order of the input data set),

    • $GoodH (Boolean variable which is TRUE if the expected heterozygosity is greater than the Hemin set by input),

    • $OutlierFlag (TRUE if the method identifies the locus as an outlier, FALSE otherwise), and

    • $q (the q-value for the test of neutrality for the locus)

    • $pvalues (the p-value for the test of neutrality for the locus)

    • $pvaluesRightTail the one-sided (right tail) p-value for a locus

Author(s)

Bernd Gruber (bugs? Post to https://groups.google.com/d/forum/dartr); original implementation of Whitlock & Lotterhos


green-striped-gecko/dartR documentation built on Sept. 7, 2024, 4:15 a.m.