synth | R Documentation |
The data represents the synthetic dataset used as an
illustrative example in the Journal of Statistical Software paper
discussing the use of the package.
There are 5 states denoted as A
, B
, C
, D
, and E
. Categorical sequences have lengths varying from 10 to 50.
data(synth)
$data contains a vector of 250 strings representing categorical sequences; $id is the original classification vector.
Melnykov, V. (2015)
Melnykov, V. (2016) Model-Based Biclustering of Clickstream Data, Computational Statistics and Data Analysis, 93, 31-45.
Melnykov, V. (2016) ClickClust: An R Package for Model-Based Clustering of Categorical Sequences, Journal of Statistical Software, 74, 1-34.
click.read
data(synth)
head(synth$data)
# FUNCTION THAT REPLACES CHARACTER STATES WITH NUMERIC VALUES
repl.levs <- function(x, ch.lev){
for (j in 1:length(ch.lev)) x <- gsub(ch.levs[j], j, x)
return(x)
}
# DETECT ALL STATES IN THE DATASET
d <- paste(synth$data, collapse = " ")
d <- strsplit(d, " ")[[1]]
ch.levs <- levels(as.factor(d))
# CONVERT DATA TO THE FORM USED BY click.read()
S <- strsplit(synth$data, " ")
S <- sapply(S, repl.levs, ch.levs)
S <- sapply(S, as.numeric)
head(S)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.