MEDseq_clustnames: Automatic labelling of clusters using central sequences

MEDseq_clustnamesR Documentation

Automatic labelling of clusters using central sequences

Description

These functions extract names for clusters according to the SPS representation of their central sequences.

Usage

MEDseq_clustnames(x,
                  cluster = TRUE,
                  size = FALSE,
                  weighted = FALSE,
                  ...)

MEDseq_nameclusts(names)

Arguments

x

An object of class "MEDseq" generated by MEDseq_fit or an object of class "MEDseqCompare" generated by MEDseq_compare.

cluster

A logical indicating whether names should be prepended with the text "Cluster g: ", where g is the cluster number. Defaults to TRUE.

size

A logical indicating whether the (typically 'soft') size of each cluster is appended to the label of each group, expressed as a percentage of the total number of observations. Defaults to FALSE.

weighted

A logical indicating whether the sampling weights (if any) are used when appending the size of each cluster to the labels. Defaults to FALSE.

...

Catches unused arguments.

names

The output of MEDseq_clustnames to be passed to the convenience function MEDseq_nameclusts (see Details).

Details

Unlike the seqclustname function from the WeightedCluster package which inspired these functions, MEDseq_clustnames only returns the names themselves, not the factor variable indicating cluster membership with labels given by those names. Thus, MEDseq_nameclusts is provided as a convenience function for precisely this purpose (see Examples).

Value

For MEDseq_clustnames, a character vector containing the names for each component defined by their central sequence, and optionally the cluster name (see cluster above) and cluster size (see size above). The name for the noise component, if any, will always be simply "Noise" (or "Cluster 0: Noise").

For MEDseq_nameclusts, a factor version of x$MAP with levels given by the output of MEDseq_clustnames.

Note

The main MEDseq_clustnames function is used internally by plot.MEDseq, MEDseq_meantime, MEDseq_stderr, and also other print and summary methods, where its invocation can typically controlled via a SPS logical argument. However, the optional arguments cluster, size, and weighted can only be passed through plot.MEDseq; elsewhere cluster=TRUE, size=FALSE, and weighted=FALSE are always assumed.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <doi:10.1111/rssa.12712>.

See Also

seqformat, seqclustname, plot.MEDseq, MEDseq_meantime, MEDseq_stderr

Examples

# Load the MVAD data
data(mvad)
mvad$Location <- factor(apply(mvad[,5:9], 1L, function(x) 
                 which(x == "yes")), labels = colnames(mvad[,5:9]))
mvad          <- list(covariates = mvad[c(3:4,10:14,87)],
                      sequences = mvad[,15:86], 
                      weights = mvad[,2])
mvad.cov      <- mvad$covariates

# Create a state sequence object with the first two (summer) time points removed
states        <- c("EM", "FE", "HE", "JL", "SC", "TR")
labels        <- c("Employment", "Further Education", "Higher Education", 
                   "Joblessness", "School", "Training")
mvad.seq      <- seqdef(mvad$sequences[-c(1,2)], states=states, labels=labels)

# Fit a model with weights and a gating covariate
# Have the probability of noise-component membership depend on the covariate
mod    <- MEDseq_fit(mvad.seq, G=5, modtype="UUN", weights=mvad$weights, 
                     gating=~ gcse5eq, covars=mvad.cov, noise.gate=TRUE)
                     
# Extract the names
names  <- MEDseq_clustnames(mod, cluster=FALSE, size=TRUE)

# Get the renamed MAP cluster membership indicator vector
group  <- MEDseq_nameclusts(names)

# Use the output in plots
plot(mod, type="d", soft=FALSE, weighted=FALSE, cluster=FALSE, size=TRUE, border=TRUE)
# same as:
# seqplot(mvad.seq, type="d", group=group)

# Indeed, this function is invoked by default for certain plot types
plot(mod, type="d", soft=TRUE, weighted=TRUE)
plot(mod, type="d", soft=TRUE, weighted=TRUE, SPS=FALSE)

# Invoke this function when printing the gating network coefficients
print(mod$gating, SPS=FALSE)
print(mod$gating, SPS=TRUE)

# Invoke this function in a call to MEDseq_meantime
MEDseq_meantime(mod, SPS=TRUE)
 
# Invoke this function in other plots
plot(mod, type="clusters", SPS=TRUE)
plot(mod, type="precision", SPS=TRUE)

MEDseq documentation built on Dec. 28, 2022, 2:35 a.m.