seqpropclust: Monothetic clustering of state sequences

View source: R/propclustering.R

seqpropclustR Documentation

Monothetic clustering of state sequences

Description

Monothetic divisive clustering of the data using object properties. For state sequences object different set of properties are automoatically extracted.

Usage

seqpropclust(seqdata, diss, properties = c("state", "duration", "spell.age", 
		"spell.dur", "transition", "pattern", "AFtransition", "AFpattern", 
		"Complexity"), other.prop = NULL, prop.only = FALSE, pmin.support = 0.05, 
		max.k = -1, with.missing = TRUE, R = 1, weight.permutation = "diss", 
		min.size = 0.01, max.depth = 5, maxcluster = NULL, ...)
		
wcPropertyClustering(diss, properties, maxcluster = NULL, ...)
dtcut(st, k, labels = TRUE)

Arguments

seqdata

State sequence object (see seqdef).

diss

a dissimilarity matrix or a dist object.

properties

Character or data.frame. In seqpropclust, it can be a list of properties to be extracted from seqdata. It can also be a data.frame specifying the properties to use for the clustering.

other.prop

data.frame. Additional properties to be considered to cluster the sequences.

prop.only

Logical. If TRUE, the function returns a data.frame containing the extracted properties (without clustering the data).

pmin.support

Numeric. Minimum support (as a proportion of sequences). See seqefsub.

max.k

Numeric. The maximum number of events allowed in a subsequence. See seqefsub.

with.missing

Logical. If TRUE, property of missing spell are also extracted.

R

Number of permutations used to assess the significance of the split. See disstree.

weight.permutation

Weight permutation method: "diss" (attach weights to the dissimilarity matrix), "replicate" (replicate cases using weights), "rounded-replicate" (replicate case using rounded weights), "random-sampling" (random assignment of covariate profiles to the objects using distributions defined by the weights.). See disstree.

min.size

Minimum number of cases in a node, will be treated as a proportion if less than 1. See disstree.

max.depth

Maximum depth of the tree. See disstree.

maxcluster

Maximum number of cluster to consider.

st

A divise clustering tree as produced by seqpropclust

k

The number of groups to extract.

labels

Logical. If TRUE, rules to assign an object to a sequence is used to label the cluster (instead of a number).

...

Arguments passed to/from other methods.

Details

The method implement the DIVCLUS-T algorithm.

Value

Return a seqpropclust object, which is (in fact) a distree object. See disstree.

References

Studer, M. (2018). Divisive property-based and fuzzy clustering for sequence analysis. In G. Ritschard and M. Studer (Eds.), Sequence Analysis and Related Approaches: Innovative Methods and Applications, Life Course Research and Social Policies. Springer.

Piccarreta R, Billari FC (2007). Clustering work and family trajectories by using a divisive algorithm. Journal of the Royal Statistical Society: Series A (Statistics in Society), 170(4), 1061-1078.

Chavent M, Lechevallier Y, Briant O (2007). DIVCLUS-T: A monothetic divisive hierarchical clustering method. Computational Statistics & Data Analysis, 52(2), 687-701.

See Also

as.clustrange, seqtreedisplay, disstree.

Examples

data(mvad)
mvad.seq <- seqdef(mvad[1:100, 17:86])

## COmpute distance using Hamming distance
diss <- seqdist(mvad.seq, method="HAM")

pclust <- seqpropclust(mvad.seq , diss=diss, maxcluster=5, properties=c("state", "duration")) 

## Run it to visualize the results
##seqtreedisplay(pclust, type="d", border=NA, showdepth=TRUE)

pclustqual <- as.clustrange(pclust, diss=diss, ncluster=5)

WeightedCluster documentation built on July 9, 2023, 5:34 p.m.