OutFLANK: Fst outliers with trimming
In whitlock/OutFLANK: Fst outliers with trimming

Description Usage Arguments Details Value

Takes Fst data for a list of loci to find outliers, using a trimmed likelihood approach.

1 2	OutFLANK(FstDataFrame, LeftTrimFraction = 0.05, RightTrimFraction = 0.05, Hmin = 0.1, NumberOfSamples, qthreshold = 0.05)

`FstDataFrame`	A data frame that includes a row for each locus, with columns as follows: $LocusName: a character string that uniquely names each locus. $FST: Fst calculated for this locus. (Kept here to report the unbiased Fst of the results) $T1: The numerator of the estimator for Fst (necessary, with $T2, to calculate mean Fst) $T2: The denominator of the estimator of Fst $FSTNoCorr: Fst calculated for this locus without sample size correction. (Used to find outliers) $T1NoCorr: The numerator of the estimator for Fst without sample size correction (necessary, with $T2, to calculate mean Fst) $T2NoCorr: The denominator of the estimator of Fst without sample size correction $He: The heterozygosity of the locus (used to screen out low heterozygosity loci that have a different distribution)
`LeftTrimFraction`	The proportion of loci that are trimmed from the lower end of the range of Fst before the likelihood function is applied.
`RightTrimFraction`	The proportion of loci that are trimmed from the upper end of the range of Fst before the likelihood funciton is applied.
`Hmin`	The minimum heterozygosity required before including calculations from a locus.
`NumberOfSamples`	The number of spatial locations included in the data set.
`qthreshold`	The desired false discovery rate threshold for calculating q-values.

This function should take in a dataframe ("FstDataFrame") that has columns for $LocusName,$Fst,$T1,$T2,$FstNoCorr, $T1NoCorr, $T2NoCorr,$H. It should return a dataframe with those same columns but also new columns for $LowOutlierFlag, $HighOutlierFlag, and $q.

This function requires Fst's calculated without sample size correction. These can be calculated, for example, with WC_FST_FiniteSample_Haploids_2AllelesB_NoSamplingCorrection in this package.

This use of the biased FSTs is necessary for the trimming outlier approach with small samples, because the debiasing sometimes creates negative Fsts which do not fit into the chi-square distribution. This will use FST's calculated without sample size correction for outlier tests. Such FSTs will be biased upwards, but as long as the sample size is similar for all loci, the resulting measures ought to be give similar results. This use of the biased FSTs is necessary for the trimming outlier approach with small samples, because the debiasing sometimes creates negative Fsts which do not fit into the chi-square distribution.

The function returns a list with seven elements:

FSTbar: the mean FST inferred from loci not marked as outliers
FSTNoCorrbar: the mean FST (not corrected for sample size—gives an upwardly biased estimate of FST)
dfInferred: the inferred number of degrees of freedom for the chi-square distribution of neutral FST
numberLowFstOutliers: Number of loci flagged as having a significantly low FST (not reliable)
numberHighFstOutliers: Number of loci identified as having significantly high FST
results: a data frame with a row for each locus. This data frame includes all the original columns in the data set, and six new ones:
- $indexOrder (the original order of the input data set),
- $GoodH (Boolean variable which is TRUE if the expected heterozygosity is greater than the Hmin set by input),
- $OutlierFlag (TRUE if the method identifies the locus as an outlier, FALSE otherwise), and
- $q (the q-value for the test of neutrality for the locus)
- $pvalues (the p-value for the test of neutrality for the locus)
- $pvaluesRightTail the one-sided (right tail) p-value for a locus