Description Usage Arguments Details Value Warning References See Also Examples
Group a TRAMPknowns
object so that knowns
with similar TRFLP patterns and knowns that share the same species
name “group” together. In general, this function will be called
automatically whenever appropriate (e.g. when loading a data set or
adding new knowns). Please see Details to understand why this
function is necessary, and how it works.
The main reason for manually calling group.knowns
is to change
the default values of the arguments; if you call group.knowns
on a TRAMPknowns
object, then any subsequent automatic call to
group.knowns
will use any arguments you passed in the
manual group.knowns
call (e.g. after doing
group.knowns(x, cut.height=20)
, all future groupings will use
cut.height=20
).
1 2 3 4 5 | group.knowns(x, ...)
## S3 method for class 'TRAMPknowns'
group.knowns(x, dist.method, hclust.method, cut.height, ...)
## S3 method for class 'TRAMP'
group.knowns(x, ...)
|
x |
A |
dist.method |
Distance method used in calculating similarity
between different knowns (see |
hclust.method |
Clustering method used in generating clusters
from the similarity matrix (see |
cut.height |
Passed to |
... |
Arguments passed to further methods. |
group.knowns
groups together knowns in a
TRAMPknowns
object based on two criteria: (1) TRFLP
profiles that are very similar across shared enzyme/primer
combinations (based on clustering) and (2) TRFLP profiles that belong
to the same species (i.e. share a common species
column in the
info
data.frame of x
; see TRAMPknowns
for
more information). This is to solve three issues in TRFLP analysis:
The TRFLP profile of a single species can have variation in
peak sizes due to DNA sequence variation. By including multiple
collections of each species, variation in TRFLP profiles can be
accounted for. If a TRAMPknowns
object contains
multiple collections of a species, these will be aggregated by
group.knowns
. This aggregation is essential for community
analysis, as leaving individual collections will artificially
inflate the number of “present species” when running
TRAMP
.
Some authors have taken an alternative approach by using a larger
tolerance in matching peaks between samples and knowns (effectively
increasing accept.error
in TRAMP
) to account
for within-species variation. This is not recommended, as it
dramatically increases the risk of incorrect matches.
Distinctly different TRFLP profiles may occur within a species
(or in some cases within an individual); see Avis et al. (2006).
group.knowns
looks at the species
column of the
info
data.frame of x
and joins any knowns with
identical species
values as a group.
This can also be used where multiple profiles are present in an
individual.
Different species may share a similar TRFLP profile and
therefore be indistinguishable using TRFLP. If these patterns are
not grouped, two species will be recorded as present wherever either
is present. group.knowns
prevents this by joining knowns with
“very similar” TRFLP patterns as a group. Ideally, these
problematic groups can be resolved by increasing the number of
enzyme/primer pairs in the data.
Groups names are generated by concatenating all unique (sorted) species names together, separated by commas.
To determine if knowns are “similar enough” to form a group, we
use R's clustering tools: dist
, hclust
and cutree
. First, we generate a distance matrix of the
knowns profiles using dist
, and using method
dist.method
(see Example below; this is very similar to what
TRAMP
does, and dist.method
should be specified
accordingly). We then generate clusters using hclust
,
and using method hclust.method
, and “cut” the tree at
cut.height
using cutree
.
Knowns are grouped together iteratively; so that all groups sharing a common cluster are grouped together, and all knowns that share a common species name are grouped together. In certain cases this may chain together seemingly unrelated groups.
Because group.knowns
is generic, it can be run on either a
TRAMPknowns
or a TRAMP
object. When run
on a TRAMP
object, it updates the TRAMPknowns
object
(stored as x$knowns
), so that subsequent calls to
plot.TRAMPknowns
or summary.TRAMPknowns
(for example) will use the new grouping parameters.
Parameters set by group.knowns
are retained as part of the
object, so that when adding additional knowns (add.known
and combine
), or when subsetting a knowns database (see
[.TRAMPknowns
,
aka TRAMPindexing
), the same grouping parameters will be
used.
For group.knowns.TRAMPknowns
, a new TRAMPknowns
object.
The cluster.pars
element will have been updated with new
parameters, if any were specified.
For group.knowns.TRAMP
, a new TRAMP
object, with an
updated knowns
element. Note that the original
TRAMPknowns
object (i.e. the one from which the TRAMP
object was constructed) will not
be modified.
Warning about missing data: where there are NA
values in
certain combinations, NA
s may be present in the final distance
matrix, which means we cannot use hclust
to generate the
clusters! In general, NA
values are fine. They just can't be
everywhere.
Avis PG, Dickie IA, Mueller GM 2006. A ‘dirty’ business: testing the limitations of terminal restriction fragment length polymorphism (TRFLP) analysis of soil fungi. Molecular Ecology 15: 873-882.
TRAMPknowns
, which describes the TRAMPknowns
object.
build.knowns
, which attempts to generate a knowns
database from a TRAMPsamples
data set.
plot.TRAMPknowns
, which graphically displays the
relationships between knowns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | data(demo.knowns)
data(demo.samples)
demo.knowns <- group.knowns(demo.knowns, cut.height=2.5)
plot(demo.knowns)
## Increasing cut.height makes groups more inclusive:
plot(group.knowns(demo.knowns, cut.height=100))
res <- TRAMP(demo.samples, demo.knowns)
m1.ungrouped <- summary(res)
m1.grouped <- summary(res, group=TRUE)
ncol(m1.grouped) # 94 groups
res2 <- group.knowns(res, cut.height=100)
m2.ungrouped <- summary(res2)
m2.grouped <- summary(res2, group=TRUE)
ncol(m2.grouped) # Now only 38 groups
## group.knowns results in the same distance matrix as produced by
## TRAMP, therefore using the same method (e.g. method="maximum") is
## important. The example below shows how the matrix produced by
## dist(summary(x)) (as calculated by group.knowns) is the same as that
## produced by TRAMP:
f <- function(x, method="maximum") {
## Create a pseudo-samples object from our knowns
y <- x
y$data$height <- 1
names(y$info)[names(y$info) == "knowns.pk"] <- "sample.pk"
names(y$data)[names(y$data) == "knowns.fk"] <- "sample.fk"
class(y) <- "TRAMPsamples"
## Run TRAMP, clean up and return
## (If method != "maximum", rescale the error to match that
## generated by dist()).
z <- TRAMP(y, x, method=method)
if ( method != "maximum" ) z$error <- z$error * z$n
names(dimnames(z$error)) <- NULL
z
}
g <- function(x, method="maximum")
as.matrix(dist(summary(x), method=method))
all.equal(f(demo.knowns, "maximum")$error, g(demo.knowns, "maximum"))
all.equal(f(demo.knowns, "euclidian")$error, g(demo.knowns, "euclidian"))
all.equal(f(demo.knowns, "manhattan")$error, g(demo.knowns, "manhattan"))
## However, TRAMP is over 100 times slower in this special case.
system.time(f(demo.knowns))
system.time(g(demo.knowns))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.