seqnullcqi: Sequence Analysis Typologies Validation Using Parametric...

View source: R/seqnullcqi.R

seqnullcqiR Documentation

Sequence Analysis Typologies Validation Using Parametric Bootstrap

Description

seqnullcqi implements the methodology proposed by Studer (2021) for the validation of sequence analysis typologies using parametric bootstraps. The method works by comparing the cluster quality of an observed typology with the quality obtained by clustering similar but nonclustered data. Several models to test the different structuring aspects of the sequences important in life-course research, namely, sequencing, timing, and duration (see function seqnull). This strategy allows identifying the key structural aspects captured by the observed typology. Plot and print methods of the seqnullcqi results are also provide.

Usage

seqnullcqi(seqdata, clustrange, R, model=c("combined", "duration", "sequencing", 
                    "stateindep", "Markov", "userpos"), seqdist.args=list(), 
					kmedoid = FALSE, hclust.method="ward.D", 
					parallel=FALSE, progressbar=FALSE, ...)
		   
## S3 method for class 'seqnullcqi'
plot(x, stat, type = c("line", "density", "boxplot", "seqdplot"),
                          quant = 0.95, norm = TRUE, legendpos = "topright",
                          alpha = 0.2, ...)

## S3 method for class 'seqnullcqi'
print(x, norm=TRUE, quant=0.95, digits=2, ...) 

Arguments

seqdata

State sequence object of class stslist. The sequence data to use. Use seqdef to create such an object.

clustrange

The clustering of the data to be validated as an object of class clustrange. See as.clustrange or wcKMedRange to create such an object.

model

String. The model used to generate the similar but nonclustered data. It can be one of "combined", "duration", "sequencing", "stateindep", "Markov" or "userpos". See seqnull for more information.

R

The number of bootstraps.

seqdist.args

List of arguments passed to seqdist for computing the distances.

kmedoid

Logical. If TRUE, the PAM algorithm is used to cluster the data using wcKMedRange. If FALSE, hclust is used.

hclust.method

String. Hierarchical method to use with hclust.

x

A seqnullcqi object to be plotted or printed.

stat

Character. The statistic to plot or "all" for all statistics. See wcClusterQuality for a list of possible values.

type

Character. The type of graphic to be plotted. If type="line" (default), a transparent line representing the cluster quality index for each bootstrap is plotted using a separate line. If type="density", the density of the maximum cluster quality index values among the different number of groups is plotted as well as the original cluster quality values. If type="beanplot", beanplot of the distribution of the cluster quality index values for each number of groups is plotted separately. If type="seqdplot", a state distribution sequence plot of the sequences generated with the null model is plotted (see seqdplot).

quant

Numeric. Quantile to use for the confidence intervals.

norm

Logical. If TRUE, cluster quality indices are standardized using the mean and standard deviation of the null distribution.

legendpos

Character. legend position, see legend.

alpha

Transparency parameter for the lines to be drawn (only for type="line").

digits

Number of digits to be printed.

parallel

Logical. Whether to initialize the parallel processing of the future package using the default multisession strategy. If FALSE (default), then the current plan is used. If TRUE, multisession plan is initialized using default values.

progressbar

Logical. Whether to initialize a progressbar using the future package. If FALSE (default), then the current progress bar handlers is used . If TRUE, a new global progress bar handlers is initialized.

...

Additionnal parameters passed to seqnull (for seqnullcqi) or plot or print.

Details

The seqnullcqi function provides a validation method for sequence analysis typologies using parametric bootstraps as proposed in Studer (2021). This method works by comparing the value of the cluster quality of an observed typology with the cluster quality obtained by clustering similar but nonclustered data. More precisely it works as follows.

  1. Cluster the observed sequence data and compute the associated cluster quality indices.

  2. Repeat R times:

    1. Generate similar but nonclustered data using a null model (see seqnull for available null models).

    2. Cluster the generated data using the same distance measure and clustering algorithm as in step 1.

    3. Record the quality indices values of this null clustering.

  3. Compare the quality of the observed typology with the one obtained in the R bootstraps with the null sequence data using plot and print methods.

  4. If the cluster quality measure of the observed typology is constantly higher than the ones obtained with null data, a “good” typology has been found.

Several null models are provided to test the different structuring aspects of the sequences important in life-course research, namely, sequencing, timing, and duration (see function seqnull and Studer, 2021). This strategy allows identifying the key structural aspects captured by the observed typology.

Value

seqnullcqi returns a "seqnullcqi" object with the following components:

seqdata

The sequence data generated by the null model (see seqnull

stats

The cluster quality indices for the null data.

clustrange

The clustering of the data to be validated as an object of class clustrange.

R

The number of bootstraps

kmedoid

Logical. If TRUE, the PAM algorithm was used to cluster the data using wcKMedRange.

hclust.method

Hierarchical method to used with hclust.

seqdist.args

List of arguments passed to seqdist for computing the distances.

nullmodel

List of arguments passed to seqnull to generate the sequence data under the null model.

References

Studer, M. (2021). Validating Sequence Analysis Typologies Using Parametric Bootstrap. Sociological Methodology. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/00811750211014232")}

See Also

See Also seqnull for description of the null models.

Examples

data(biofam)

## Create the sequence object
bf.seq <- seqdef(biofam[sample.int(nrow(biofam), 100),10:25])

## Library fastcluster greatly improve computation time when using hclust
# library(fastcluster)
## Computing distances
diss <- seqdist(bf.seq, method="HAM")
## Hierarchical clustering
hc <- hclust(as.dist(diss), method="ward.D")
# Computing cluster quality measures.
clustqual <- as.clustrange(hc, diss=diss, ncluster=7)

# Compute cluster quality measure for the null model "combined"
# seqdist.args should be the same as for seqdist above except the sequence data.
# Clustering methods should be the same as above.
bcq <- seqnullcqi(bf.seq, clustqual, R=5, model=c("combined"), 
				seqdist.args=list(method="HAM"),
				hclust.method="ward.D")

# Print the results
bcq

## Different kind of plots

plot(bcq, stat="ASW", type="line")
plot(bcq, stat="ASW", type="density")
plot(bcq, stat="ASW", type="boxplot")


WeightedCluster documentation built on April 17, 2024, 3:01 p.m.