MEDseq-package | R Documentation |
Fits MEDseq models: mixtures of Exponential-Distance models with gating covariates and sampling weights. Typically used for clustering categorical/longitudinal life-course sequences.
Fits _MEDseq_ models introduced by Murphy et al. (2021) <doi: 10.1111/rssa.12712>, i.e. fits mixtures of exponential-distance models for clustering longitudinal life-course sequence data via the EM/CEM algorithm.
A family of parsimonious precision parameter constraints are accommodated. So too are sampling weights. Gating covariates can be supplied via formula interfaces.
The most important function in the MEDseq package is: MEDseq_fit
, for fitting the models via EM/CEM.
MEDseq_control
allows supplying additional arguments which govern, among other things, controls on the initialisation of the allocations for the EM/CEM algorithm and the various model selection options.
MEDseq_compare
is provided for conducting model selection between different results from using different covariate combinations &/or initialisation strategies, etc.
MEDseq_stderr
is provided for computing the standard errors of the coefficients for the covariates in the gating network.
A dedicated plotting function plot.MEDseq
exists for visualising various aspects of the results, using new methods as well as some existing methods adapted from the TraMineR package.
Finally, the package also contains two data sets: biofam
and mvad
.
Type: Package
Package: MEDseq
Version: 1.4.0
Date: 2022-12-20 (this version), 2019-08-24 (original release)
Licence: GPL (>=2)
Further details and examples are given in the associated vignette document:
vignette("MEDseq", package = "MEDseq")
Keefe Murphy [aut, cre], Thomas Brendan Murphy [ctb], Raffaella Piccarreta [ctb], Isobel Claire Gormley [ctb]
Maintainer: Keefe Murphy - <keefe.murphy@mu.ie>
Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <doi:10.1111/rssa.12712>.
Useful links:
# Load the MVAD data data(mvad) mvad$Location <- factor(apply(mvad[,5:9], 1L, function(x) which(x == "yes")), labels = colnames(mvad[,5:9])) mvad <- list(covariates = mvad[c(3:4,10:14,87)], sequences = mvad[,15:86], weights = mvad[,2]) mvad.cov <- mvad$covariates # Create a state sequence object with the first two (summer) time points removed states <- c("EM", "FE", "HE", "JL", "SC", "TR") labels <- c("Employment", "Further Education", "Higher Education", "Joblessness", "School", "Training") mvad.seq <- seqdef(mvad$sequences[-c(1,2)], states=states, labels=labels) # Fit a range of unweighted models without covariates # Only consider models with a noise component # Supply some MEDseq_control() arguments mod1 <- MEDseq_fit(mvad.seq, G=9:10, modtype=c("CCN", "CUN", "UCN", "UUN"), algo="CEM", init.z="kmodes", criterion="icl") # Fit a model with weights and gating covariates # Have the probability of noise-component membership be constant mod2 <- MEDseq_fit(mvad.seq, G=11, modtype="UUN", weights=mvad$weights, gating=~ gcse5eq, covars=mvad.cov, noise.gate=FALSE) # Examine this model and its gating network summary(mod2, network=TRUE) plot(mod2, "clusters")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.