Description Usage Arguments Details Value Examples
Compute timescaled haplotypic densities (THDs) for a series of individuals based on genetic distances.
'thd' is the main function, taking a distance matrix as input. The helper function 'thd.bandwidth' computes the kernel density bandwidth (or smoothing parameter) for a given timescale, no. of markers and substitution rate.
1 2 3 |
H |
matrix of genetic (Hamming) distances between individuals. Must be symmetric. |
t |
timescale, reflecting the approximate length of time before present considered for density estimation |
m |
number of markers (e.g. nucleotides) used in genetic distance computation |
mu |
substitution rate per marker per unit of time |
scale |
optional rescaling of output. Either |
q |
quantile (see details) |
The scale
parameter provides scaling options detailed below.
"density"
:
(default) the THD output is a normalized density, that typically decreases with the no. of markers.
"size"
:
the density is multiplied by the number of markers m
. This scaled output is approximately invariant wrt to m
, allowing
to compare outputs between datasets with various resolutions (e.g. minisatellites with small m
vs whole genomes with large m
).
"relative"
:
the output is expressed relative to the theoretical maximum value, obtained for a set of completely identical isolates (that is, H
is a matrix of zeroes).
This scaling can be useful when the individuals are highly similar. The output is between 0 and 1.
"time"
:
the output is expressed relative to the theoretical value obtained if all individuals has diverged exactly at rate mu
for a period of length t
(that is,
H
is a matrix where all off-diagonal entries equal iam.distance(t, m, mu)
). This scaling allows to compare analyses with different timescales.
The optional quantile
parameter controls the relationship between the provided timescale t
and the kernel bandwidth.
By default, the bandwidth is chosen so that individuals whose estimated time of divergence is at most t
account for 50
Setting q = 0.1
means that a divergence less than t
now accounts for 10
The parameter is provided for the sake of completeness, change it only if you know what you're doing.
#' Additional requirements:
The unit of time must be identical for t
and mu
.
H
must be symmetric with zeroes on the diagonal
Vector of THD values, named after column names of H
if present.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | # Simulate distance matrix
H <- matrix(c(
00, 00, 00, 00, 00, 00,
01, 00, 00, 00, 00, 00,
03, 01, 00, 00, 00, 00,
21, 18, 28, 00, 00, 00,
23, 25, 31, 07, 00, 00,
19, 27, 23, 09, 11, 00
), ncol = 6, byrow = TRUE)
# Make matrix symmetric
H <- H + t(H)
colnames(H) <- LETTERS[1:ncol(H)]
# Dendrogram
hc <- hclust(as.dist(H))
# Select THD parameters
m <- 64
mu <- 0.001
t50 <- 20
# Compute THD values
x <- thd(H, t50, m, mu, scale = "time")
# Visualize dendrogram and THD
par(mfrow = c(2, 1))
par(mar = c(2, 10, 2, 10))
plot(hc, hang = -1, xlab = "", yaxt = "n", ylab = "")
par(mar = c(2, 4, 2, 4))
barplot(x[hc$order], xlim = c())
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.