discrete_discover: Unsupervised meta-analytical discovery and validation of...

View source: R/discrete_discover.R

discrete_discoverR Documentation

Unsupervised meta-analytical discovery and validation of discrete clustering structures in microbial abundance data

Description

discrete_discover takes as input sample-by-sample dissimilarity measurements (generated from microbial abundance profiles), and performs unsupervised clustering within each batch across a range of cluster numbers. It then evaluates the support for each cluster number with both internal (i.e., samples within the batch) and external (i.e., samples in other batches) data. Internal evaluation is realized with prediction.strength and external evaluation is based on a generalized version of the same method. discrete_discover generates as output the evaluation statistics for each cluster number. A cluster number with good support from both internal and external evaluations provides meta-analytical evidence for discrete structures in the microbial abundance profiles.

Usage

discrete_discover(D, batch, data, control)

Arguments

D

sample-by-sample dissimilarity measurements. Should be provided as a dist object.

batch

name of the batch variable. This variable in data should be a factor variable and will be converted to so with a warning if otherwise.

data

data frame of metadata, columns must include batch.

control

a named list of additional control parameters. See details.

Details

control should be provided as a named list of the following components (can be a subset).

k_max

integer. Maximum number of clusters to evaluate. discrete_discover will evaluate clustering structures corresponding to cluster numbers ranging from 2 to k_max. Default to 10.

cluster_function

an interface function. This function will be used for unsupervised clustering for discrete structure evaluation. This corresponds to the clustermethod parameter in prediction.strength, and similarly, should also follow the specifications as detailed in clusterboot. Default to claraCBI

classify_method

character. Classification method used to assign observations in the method's internal and external evaluation stage. Corresponds to the classification parameter in prediction.strength, and can only be either "centroid" or "knn". Default to "centroid".

M

integer. Number of random iterations to partition the batch during method's internal evaluation. Corresponds to the M parameter in prediction.strength. Default to 30.

nnk

integer. Numbber of nearest neighbors if classify_method="knn". Corresponds to the nnk parameter in prediction.strength. Default to 1.

diagnostic_plot

character. Name for the generated diagnostic figure file. Default to "discrete_diagnostic.pdf". Can be set to NULL in which case no output will be generated.

verbose

logical. Indicates whether or not verbose information will be printed.

Value

a list, with the following components:

internal_mean, internal_se

matrices of internal clustering structure evaluation measurements (prediction strengths). Columns and rows corresponds to different batches and different numbers of clusters, respectively. internal_mean and internal_se, as the names suggest, are the mean and standard error of prediction strengths for each batch/cluster number.

external_mean, external_se

same structure as internal_mean and internal_se, but records external clustering structure evaluation measurements (generalized prediction strength).

control

list of additional control parameters used in the function call.

Author(s)

Siyuan Ma, siyuanma@g.harvard.edu

Examples

data("CRC_abd", "CRC_meta")
# Calculate Bray-Curtis dissimilarity between the samples
library(vegan)
D <- vegdist(t(CRC_abd))
fit_discrete <- discrete_discover(D = D,
                                  batch = "studyID",
                                  data = CRC_meta)

biobakery/MMUPHin documentation built on March 30, 2024, 4:50 a.m.