View source: R/discrete_discover.R
discrete_discover | R Documentation |
discrete_discover
takes as input sample-by-sample dissimilarity
measurements (generated from microbial abundance profiles), and performs
unsupervised clustering within each batch across a range of cluster numbers.
It then evaluates the support for each cluster number with both internal
(i.e., samples within the batch) and external (i.e., samples in other
batches) data. Internal evaluation is realized with
prediction.strength
and external evaluation is based on
a generalized version of the same method. discrete_discover
generates
as output the evaluation statistics for each cluster number. A cluster number
with good support from both internal and external evaluations provides
meta-analytical evidence for discrete structures in the microbial abundance
profiles.
discrete_discover(D, batch, data, control)
D |
sample-by-sample dissimilarity measurements. Should be provided as
a |
batch |
name of the batch variable. This variable in data should be a factor variable and will be converted to so with a warning if otherwise. |
data |
data frame of metadata, columns must include batch. |
control |
a named list of additional control parameters. See details. |
control
should be provided as a named list of the following components
(can be a subset).
integer. Maximum number of clusters to evaluate. discrete_discover
will evaluate clustering structures corresponding to cluster numbers ranging
from 2 to k_max
. Default to 10.
an interface function. This function will be used for unsupervised clustering
for discrete structure evaluation. This corresponds to the
clustermethod
parameter in
prediction.strength
, and similarly, should also follow the
specifications as detailed in clusterboot
. Default to
claraCBI
character. Classification method used to assign observations in the method's
internal and external evaluation stage. Corresponds to the
classification
parameter in prediction.strength
,
and can only be either "centroid"
or "knn"
. Default to
"centroid".
integer. Number of random iterations to partition the batch during method's
internal evaluation. Corresponds to the M
parameter in
prediction.strength
. Default to 30.
integer. Numbber of nearest neighbors if classify_method="knn"
.
Corresponds to the nnk
parameter in
prediction.strength
. Default to 1.
character. Name for the generated diagnostic figure file. Default to
"discrete_diagnostic.pdf"
. Can be set to NULL
in which
case no output will be generated.
logical. Indicates whether or not verbose information will be printed.
a list, with the following components:
matrices of internal clustering structure evaluation measurements
(prediction strengths). Columns and rows corresponds to different batches and
different numbers of clusters, respectively. internal_mean
and
internal_se
, as the names suggest, are the mean and standard error of
prediction strengths for each batch/cluster number.
same structure as internal_mean
and internal_se
, but records
external clustering structure evaluation measurements (generalized prediction
strength).
list of additional control parameters used in the function call.
Siyuan Ma, siyuanma@g.harvard.edu
data("CRC_abd", "CRC_meta")
# Calculate Bray-Curtis dissimilarity between the samples
library(vegan)
D <- vegdist(t(CRC_abd))
fit_discrete <- discrete_discover(D = D,
batch = "studyID",
data = CRC_meta)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.