gl.report.hamming: Calculates the pairwise Hamming distance between DArT trimmed...

View source: R/gl.report.hamming.r

gl.report.hammingR Documentation

Calculates the pairwise Hamming distance between DArT trimmed DNA sequences

Description

Hamming distance is calculated as the number of base differences between two sequences which can be expressed as a count or a proportion. Typically, it is calculated between two sequences of equal length. In the context of DArT trimmed sequences, which differ in length but which are anchored to the left by the restriction enzyme recognition sequence, it is sensible to compare the two trimmed sequences starting from immediately after the common recognition sequence and terminating at the last base of the shorter sequence.

Usage

gl.report.hamming(
  x,
  rs = 5,
  threshold = 3,
  taglength = 69,
  plot.out = TRUE,
  plot_theme = theme_dartR(),
  plot_colors = two_colors,
  probar = FALSE,
  save2tmp = FALSE,
  verbose = NULL
)

Arguments

x

Name of the genlight object containing the SNP data [required].

rs

Number of bases in the restriction enzyme recognition sequence [default 5].

threshold

Minimum acceptable base pair difference for display on the boxplot and histogram [default 3].

taglength

Typical length of the sequence tags [default 69].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

Theme for the plot. See Details for options [default theme_dartR()].

plot_colors

List of two color names for the borders and fill of the plots [default two_colors].

probar

If TRUE, then a progress bar is displayed on long loops [default TRUE].

save2tmp

If TRUE, saves any ggplots and listings to the session temporary directory (tempdir) [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity].

Details

The function gl.filter.hamming will filter out one of two loci if their Hamming distance is less than a specified percentage

Hamming distance can be computed by exploiting the fact that the dot product of two binary vectors x and (1-y) counts the corresponding elements that are different between x and y. This approach can also be used for vectors that contain more than two possible values at each position (e.g. A, C, T or G).

If a pair of DNA sequences are of differing length, the longer is truncated.

The algorithm is that of Johann de Jong https://johanndejong.wordpress.com/2015/10/02/faster-hamming-distance-in-r-2/ as implemented in utils.hamming

Plots and table are saved to the session's temporary directory (tempdir)

Examples of other themes that can be used can be consulted in

Value

Returns unaltered genlight object

Author(s)

Custodian: Arthur Georges – Post to https://groups.google.com/d/forum/dartr

See Also

gl.filter.hamming

Other report functions: gl.report.bases(), gl.report.callrate(), gl.report.diversity(), gl.report.heterozygosity(), gl.report.hwe(), gl.report.ld.map(), gl.report.locmetric(), gl.report.maf(), gl.report.monomorphs(), gl.report.overshoot(), gl.report.pa(), gl.report.parent.offspring(), gl.report.rdepth(), gl.report.reproducibility(), gl.report.secondaries(), gl.report.sexlinked(), gl.report.taglength()

Examples

 
gl.report.hamming(testset.gl[,1:100])
gl.report.hamming(testset.gs[,1:100])


#' # SNP data
test <- platypus.gl
test <- gl.subsample.loci(platypus.gl,n=50)
result <- gl.filter.hamming(test, threshold=0.25, verbose=3)


green-striped-gecko/dartR documentation built on Sept. 7, 2024, 4:15 a.m.