View source: R/gl.report.secondaries.r
gl.report.secondaries | R Documentation |
SNP datasets generated by DArT include fragments with more than one SNP (that is, with secondaries). They are recorded separately with the same CloneID (=AlleleID). These multiple SNP loci within a fragment are likely to be linked, and so you may wish to remove secondaries.
This function reports statistics associated with secondaries, and the consequences of filtering them out, and provides three plots. The first is a boxplot, the second is a barplot of the frequency of secondaries per sequence tag, and the third is the Poisson expectation for those frequencies including an estimate of the zero class (no. of sequence tags with no SNP scored).
gl.report.secondaries(
x,
nsim = 1000,
taglength = 69,
plot.out = TRUE,
plot_theme = theme_dartR(),
plot_colors = two_colors,
save2tmp = FALSE,
verbose = NULL
)
x |
Name of the genlight object containing the SNP data [required]. |
nsim |
The number of simulations to estimate the mean of the Poisson distribution [default 1000]. |
taglength |
Typical length of the sequence tags [default 69]. |
plot.out |
Specify if plot is to be produced [default TRUE]. |
plot_theme |
Theme for the plot. See Details for options [default theme_dartR()]. |
plot_colors |
List of two color names for the borders and fill of the plots [default two_colors]. |
save2tmp |
If TRUE, saves any ggplots and listings to the session temporary directory (tempdir) [default FALSE]. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
The function gl.filter.secondaries
will filter out the
loci with secondaries retaining only one sequence tag.
Heterozygosity as estimated by the function
gl.report.heterozygosity
is in a sense relative, because it
is calculated against a background of only those loci that are polymorphic
somewhere in the dataset. To allow intercompatibility across studies and
species, any measure of heterozygosity needs to accommodate loci that are
invariant (autosomal heterozygosity. See Schmidt et al 2021). However, the
number of invariant loci are unknown given the SNPs are detected as single
point mutational variants and invariant sequences are discarded, and
because of the particular additional filtering pre-analysis. Modelling the
counts of SNPs per sequence tag as a Poisson distribution in this script
allows estimate of the zero class, that is, the number of invariant loci.
This is reported, and the veracity of the estimate can be assessed by the
correspondence of the observed frequencies against those under Poisson
expectation in the associated graphs. The number of invariant loci can then
be optionally provided to the function
gl.report.heterozygosity
via the parameter n.invariants.
In case the calculations for the Poisson expectation of the number of
invariant sequence tags fail to converge, try to rerun the analysis with a
larger nsim
values.
This function now also calculates the number of invariant sites (i.e.
nucleotides) of the sequence tags (if TrimmedSequence
is present in
x$other$loc.metrics
) or estimate these by assuming that the average
length of the sequence tags is 69 nucleotides. Based on the Poisson
expectation of the number of invariant sequence tags, it also estimates the
number of invariant sites for these to eventually provide an estimate of
the total number of invariant sites.
Note, previous version of
dartR
would only return an estimate of the number of invariant
sequence tags (not sites).
Plots are saved to the session temporary directory (tempdir).
Examples of other themes that can be used can be consulted in:
A data.frame with the list of parameter values
n.total.tags Number of sequence tags in total
n.SNPs.secondaries Number of secondary SNP loci that would be removed on filtering
n.invariant.tags Estimated number of invariant sequence tags
n.tags.secondaries Number of sequence tags with secondaries
n.inv.gen Number of invariant sites in sequenced tags
mean.len.tag Mean length of sequence tags
n.invariant Total Number of invariant sites (including invariant sequence tags)
k Lambda: mean of the Poisson distribution of number of SNPs in the sequence tags
Custodian: Arthur Georges (Post to https://groups.google.com/d/forum/dartr)
Schmidt, T.L., Jasper, M.-E., Weeks, A.R., Hoffmann, A.A., 2021. Unbiased population heterozygosity estimates from genome-wide sequence data. Methods in Ecology and Evolution n/a.
gl.filter.secondaries
,gl.report.heterozygosity
,
utils.n.var.invariant
Other report functions:
gl.report.bases()
,
gl.report.callrate()
,
gl.report.diversity()
,
gl.report.hamming()
,
gl.report.heterozygosity()
,
gl.report.hwe()
,
gl.report.ld.map()
,
gl.report.locmetric()
,
gl.report.maf()
,
gl.report.monomorphs()
,
gl.report.overshoot()
,
gl.report.pa()
,
gl.report.parent.offspring()
,
gl.report.rdepth()
,
gl.report.reproducibility()
,
gl.report.sexlinked()
,
gl.report.taglength()
require("dartR.data")
test <- gl.filter.callrate(platypus.gl,threshold = 1)
n.inv <- gl.report.secondaries(test)
gl.report.heterozygosity(test, n.invariant = n.inv[7, 2])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.