DmmParam-class: Dirichlet multinomial mixture clustering

DmmParam-classR Documentation

Dirichlet multinomial mixture clustering

Description

Apply the Dirichlet multinomial mixture (DMM) algorithm from the DirichletMultinomial package. This is commonly used in microbial ecology and in analyses of metagenomic and 16S rRNA count data.

Usage

DmmParam(k = 1:3, type = "laplace", seed = NULL, BPPARAM = SerialParam())

## S4 method for signature 'ANY,DmmParam'
clusterRows(x, BLUSPARAM, full = FALSE)

Arguments

k

An integer vector indicating the number of clusters to create with the DMM algorithm. A vector containing two or more values will instruct clusterRows to perform clustering on each number, and choose the optimal number of clusters based on type.

type

A string specifying the method to use to find the optimal number of clusters. Must be equal to "laplace", "AIC" or "BIC". Only used when k contains multiple values.

seed

Integer scalar specifying the seed to use. If NULL, a random value is used on each invocation of clusterRows.

BPPARAM

A BiocParallelParam object indicating how multiple clusterings should be parallelized. Only relevant if k contains multiple values.

x

A numeric matrix-like object where rows represent observations and columns represent variables. Values are expected to be counts.

BLUSPARAM

A BlusterParam object specifying the algorithm to use.

full

Logical scalar indicating whether the full clustering statistics should be returned for each method.

Details

To modify an existing DmmParam object x, users can simply call x[[i]] or x[[i]] <- value where i is any argument used in the constructor.

Value

The DmmParam constructor will return a DmmParam object with the specified parameters.

The clusterRows method will return a factor of length equal to nrow(x) containing the cluster assignments. If full=TRUE, a list is returned with clusters (the factor, as above) and objects; the latter is a list containing:

  • dmm, a list containing the output of dmn for each value of k.

  • best, an integer scalar specifying the best choice of k according to the method of type.

  • prob, a matrix array of probabilities where each row is an observation and each column is a cluster. The number of columns is set to the best number of clusters in best.

  • seed, an integer scalar specifying the seed used for clustering.

Author(s)

Basil Courbayre

References

Holmes I, Harris K and Quince C (2012). Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS ONE, 7(2), 1-15

Examples

# Mocking up a small example.
nfeatures <- 50
out1 <- matrix(rpois(20 * nfeatures, lambda = rgamma(nfeatures, 5)), ncol=nfeatures, byrow=TRUE)
out2 <- matrix(rpois(20 * nfeatures, lambda = rgamma(nfeatures, 5)), ncol=nfeatures, byrow=TRUE)
out <- rbind(out1, out2)
clusterRows(out, DmmParam())


LTLA/bluster documentation built on Sept. 8, 2024, 4:37 a.m.