seqnullcqi | R Documentation |
seqnullcqi
implements the methodology proposed by Studer (2021) for the validation of sequence analysis typologies using parametric bootstraps. The method works by comparing the cluster quality of an observed typology with the quality obtained by clustering similar but nonclustered data. Several models to test the different structuring aspects of the sequences important in life-course research, namely, sequencing, timing, and duration (see function seqnull
). This strategy allows identifying the key structural aspects captured by the observed typology. Plot and print methods of the seqnullcqi
results are also provide.
seqnullcqi(seqdata, clustrange, R, model=c("combined", "duration", "sequencing",
"stateindep", "Markov", "userpos"), seqdist.args=list(),
kmedoid = FALSE, hclust.method="ward.D",
parallel=FALSE, progressbar=FALSE, ...)
## S3 method for class 'seqnullcqi'
plot(x, stat, type = c("line", "density", "boxplot", "seqdplot"),
quant = 0.95, norm = TRUE, legendpos = "topright",
alpha = 0.2, ...)
## S3 method for class 'seqnullcqi'
print(x, norm=TRUE, quant=0.95, digits=2, ...)
seqdata |
State sequence object of class |
clustrange |
The clustering of the data to be validated as an object of class |
model |
String. The model used to generate the similar but nonclustered data. It can be one of |
R |
The number of bootstraps. |
seqdist.args |
List of arguments passed to |
kmedoid |
Logical. If |
hclust.method |
String. Hierarchical method to use with |
x |
A |
stat |
Character. The statistic to plot or "all" for all statistics. See |
type |
Character. The type of graphic to be plotted. If |
quant |
Numeric. Quantile to use for the confidence intervals. |
norm |
Logical. If |
legendpos |
Character. legend position, see |
alpha |
Transparency parameter for the lines to be drawn (only for |
digits |
Number of digits to be printed. |
parallel |
Logical. Whether to initialize the parallel processing of the |
progressbar |
Logical. Whether to initialize a progressbar using the |
... |
Additionnal parameters passed to |
The seqnullcqi
function provides a validation method for sequence analysis typologies using parametric bootstraps as proposed in Studer (2021). This method works by comparing the value of the cluster quality of an observed typology with the cluster quality obtained by clustering similar but nonclustered data. More precisely it works as follows.
Cluster the observed sequence data and compute the associated cluster quality indices.
Repeat R
times:
Generate similar but nonclustered data using a null model (see seqnull
for available null models).
Cluster the generated data using the same distance measure and clustering algorithm as in step 1.
Record the quality indices values of this null clustering.
Compare the quality of the observed typology with the one obtained in the R
bootstraps with the null sequence data using plot and print methods.
If the cluster quality measure of the observed typology is constantly higher than the ones obtained with null data, a “good” typology has been found.
Several null models are provided to test the different structuring aspects of the sequences important in life-course research, namely, sequencing, timing, and duration (see function seqnull
and Studer, 2021). This strategy allows identifying the key structural aspects captured by the observed typology.
seqnullcqi
returns a "seqnullcqi"
object with the following components:
seqdata |
The sequence data generated by the null model (see |
stats |
The cluster quality indices for the null data. |
clustrange |
The clustering of the data to be validated as an object of class |
R |
The number of bootstraps |
kmedoid |
Logical. If |
hclust.method |
Hierarchical method to used with |
seqdist.args |
List of arguments passed to |
nullmodel |
List of arguments passed to |
Studer, M. (2021). Validating Sequence Analysis Typologies Using Parametric Bootstrap. Sociological Methodology. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/00811750211014232")}
A brief introduction to the R
code needed to use parametric bootstraps for typology validation in sequence analysis is provided here https://sequenceanalysis.org/2023/10/19/validating-sequence-analysis-typologies-using-parametric-bootstrap/
See Also seqnull
for description of the null models.
data(biofam)
## Create the sequence object
bf.seq <- seqdef(biofam[sample.int(nrow(biofam), 100),10:25])
## Library fastcluster greatly improve computation time when using hclust
# library(fastcluster)
## Computing distances
diss <- seqdist(bf.seq, method="HAM")
## Hierarchical clustering
hc <- hclust(as.dist(diss), method="ward.D")
# Computing cluster quality measures.
clustqual <- as.clustrange(hc, diss=diss, ncluster=7)
# Compute cluster quality measure for the null model "combined"
# seqdist.args should be the same as for seqdist above except the sequence data.
# Clustering methods should be the same as above.
bcq <- seqnullcqi(bf.seq, clustqual, R=5, model=c("combined"),
seqdist.args=list(method="HAM"),
hclust.method="ward.D")
# Print the results
bcq
## Different kind of plots
plot(bcq, stat="ASW", type="line")
plot(bcq, stat="ASW", type="density")
plot(bcq, stat="ASW", type="boxplot")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.