MEDseq_clustnames: Automatic labelling of clusters using central sequences
In MEDseq: Mixtures of Exponential-Distance Models with Covariates

MEDseq_clustnames

R Documentation

Automatic labelling of clusters using central sequences

Description

These functions extract names for clusters according to the SPS representation of their central sequences.

Usage

MEDseq_clustnames(x,
                  cluster = TRUE,
                  size = FALSE,
                  MAP = FALSE,
                  weighted = FALSE,
                  ...)

MEDseq_nameclusts(names)

Arguments

`x`	An object of class `"MEDseq"` generated by `MEDseq_fit` or an object of class `"MEDseqCompare"` generated by `MEDseq_compare`.
`cluster`	A logical indicating whether names should be prepended with the text “`Cluster g:` ”, where `g` is the cluster number. Defaults to `TRUE`.
`size`	A logical indicating whether the (typically ‘soft’) size of each cluster is appended to the label of each group, expressed as a percentage of the total number of observations. Defaults to `FALSE`.
`MAP`	A logical indicating whether to use the MAP classification in the computation of the `size` of each cluster, or the ‘soft’ clustering assignment probabilities given by `x$z`. Defaults to `FALSE`, but is always `TRUE` for models fitted by the CEM algorithm (see `MEDseq_control`), and is only relevant when `size=TRUE`. See `weighted` for incorporating the sampling weights (regardless of the value of `MAP`). The `MAP` argument here plays a similar role to `map.size` in `MEDseq_meantime`.
`weighted`	A logical indicating whether the sampling weights (if any) are used when appending the `size` of each cluster to the labels. Defaults to `FALSE` and only relevant when `size=TRUE`. The `MAP` argument here plays a similar role to `wt.size` in `MEDseq_meantime`.
`...`	Catches unused arguments.
`names`	The output of `MEDseq_clustnames` to be passed to the convenience function `MEDseq_nameclusts` (see `Details`).

Details

Unlike the seqclustname function from the WeightedCluster package which inspired these functions, MEDseq_clustnames only returns the names themselves, not the factor variable indicating cluster membership with labels given by those names. Thus, MEDseq_nameclusts is provided as a convenience function for precisely this purpose (see Examples).

Value

For MEDseq_clustnames, a character vector containing the names for each component defined by their central sequence, and optionally the cluster name (see cluster above) and cluster size (see size above). The name for the noise component, if any, will always be simply "Noise" (or "Cluster 0: Noise").

For MEDseq_nameclusts, a factor version of x$MAP with levels given by the output of MEDseq_clustnames.

Note

The main MEDseq_clustnames function is used internally by plot.MEDseq, MEDseq_meantime, MEDseq_stderr, and also other print and summary methods, where its invocation can typically controlled via a SPS logical argument. However, the optional arguments cluster, size, MAP, and weighted can only be passed through plot.MEDseq; elsewhere cluster=TRUE, size=FALSE, MAP=FALSE, and weighted=FALSE are always assumed. When invoked within plot.MEDseq, the MAP argument is renamed to soft, where MAP=!soft such that soft=TRUE by default.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/rssa.12712")}>.

Examples


# Load the MVAD data
data(mvad)
mvad$Location <- factor(apply(mvad[,5:9], 1L, function(x) 
                 which(x == "yes")), labels = colnames(mvad[,5:9]))
mvad          <- list(covariates = mvad[c(3:4,10:14,87)],
                      sequences = mvad[,15:86], 
                      weights = mvad[,2])
mvad.cov      <- mvad$covariates

# Create a state sequence object with the first two (summer) time points removed
states        <- c("EM", "FE", "HE", "JL", "SC", "TR")
labels        <- c("Employment", "Further Education", "Higher Education", 
                   "Joblessness", "School", "Training")
mvad.seq      <- seqdef(mvad$sequences[-c(1,2)], states=states, labels=labels)

# Fit a model with weights and a gating covariate
# Have the probability of noise-component membership depend on the covariate
mod    <- MEDseq_fit(mvad.seq, G=5, modtype="UUN", weights=mvad$weights, 
                     gating=~ gcse5eq, covars=mvad.cov, noise.gate=TRUE)
                     
# Extract the names
names  <- MEDseq_clustnames(mod, cluster=FALSE, size=TRUE)

# Get the renamed MAP cluster membership indicator vector
group  <- MEDseq_nameclusts(names)

# Use the output in plots
plot(mod, type="d", soft=FALSE, weighted=FALSE, cluster=FALSE, size=TRUE, border=TRUE)
# same as:
# seqplot(mvad.seq, type="d", group=group)

# Indeed, this function is invoked by default for certain plot types
plot(mod, type="d", soft=TRUE, weighted=TRUE)
plot(mod, type="d", soft=TRUE, weighted=TRUE, SPS=FALSE)

# Invoke this function when printing the gating network coefficients
print(mod$gating, SPS=FALSE)
print(mod$gating, SPS=TRUE)

# Invoke this function in a call to MEDseq_meantime
MEDseq_meantime(mod, SPS=TRUE)
 
# Invoke this function in other plots
plot(mod, type="clusters", SPS=TRUE, size=TRUE)
plot(mod, type="precision", SPS=TRUE, size=TRUE, weighted=FALSE)

MEDseq documentation built on April 4, 2025, 5:26 a.m.