R package rEMM: Extensible Markov Model for Modelling Temporal Relationships Between Clusters

version stream r-universe
status CRAN RStudio mirror

Implements TRACDS (Temporal Relationships between Clusters for Data Streams), a generalization of Extensible Markov Model (EMM), to model transition probabilities in sequence data. TRACDS adds a temporal or order model to data stream clustering by superimposing a dynamically adapting Markov Chain. Also provides an implementation of EMM (TRACDS on top of tNN data stream clustering).

Interface classes DSC_tNN and DSC_EMM for the stream package are provided.


Stable CRAN version: install from within R with


Current development version: Install from r-universe.


We use a artificial dataset with a mixture of four clusters components. Points are generated using a fixed sequence \<1,2,1,3,4> through the four clusters. The lines below indicate the sequence.



plot(EMMsim_train, pch = NA)
lines(EMMsim_train, col = "gray")
points(EMMsim_train, pch = EMMsim_sequence_train)

EMM recovers the components and the sequence information. We use EMM and then recluster the found structure assuming that we know that there are 4 components. The graph below represents a Markov model of the found sequence.

emm <- EMM(threshold = 0.1, measure = "euclidean")
build(emm, EMMsim_train)
emmc <- recluster_hclust(emm, k = 4, method = "average")

We can now score new sequences (we use a test sequence created in the same way as the training data) by calculating the product the transition probabilities in the model. The high score indicates this.

score(emmc, EMMsim_test)
## [1] 0.71



Development of this package was supported in part by NSF IIS-0948893 and R21HG005912 from the National Human Genome Research Institute.

Try the rEMM package in your browser

Any scripts or data that you put into this service are public.

rEMM documentation built on June 26, 2022, 1:06 a.m.