regressboot: Bootstrap replicates of a typology and its association with...
In WeightedCluster: Clustering of Weighted Data

regressboot

R Documentation

Bootstrap replicates of a typology and its association with covariates of interest

Description

The regressboot function corresponds to the first part of the Robustness Assessment of Regressions using Cluster Analysis Typologies (RARCAT) procedure, which allows for evaluating the impact of sampling uncertainty on a standard Sequence Analysis, and thus assessing the reliability of its findings. See Roth et al. (2024) or the R tutorial as WeightedCluster vignette for all details on this procedure and its utility. regressboot should be used together with the bootpool function.

Usage

regressboot(formula, data, diss, B = 500, count = FALSE,
            algo = "pam", method = "ward.D",  
            fixed = FALSE, kcluster = 10, cqi = "CH",
            parallel = "no", ncpus = 1, cl = NULL)

Arguments

`formula`	A formula object with the clustering solution on the left side and the covariates of interest on the ride side.
`data`	The dataset (data frame) with column names corresponding to the information in formula. The number of individuals (row number) should match the dimension of `diss`.
`diss`	The numerical dissimilarity matrix used for clustering. Only a pre-computed matrix (i.e., where pairwise dissimilarities do not depend on the resample) is currently supported.
`B`	The integer number of bootstrap. Set to 500 by default to attain a satisfactory precision around the estimates as the procedure involves multiple steps.
`count`	Logical. Whether the bootstrap runs are counted on the screen or not.
`algo`	The clustering algorithm as a character string. Currently only "pam" (calling the function `wcKMedRange`) and "hierarchical" (calling the function `fastcluster::hclust`) are supported. By default "pam".
`method`	A character string with the method argument of `hclust`, "ward.D" by default.
`fixed`	Logical. TRUE implies that the number of clusters is the same in every bootstrap. FALSE (default) implies that an optimal number of clusters is evaluated each time.
`kcluster`	Integer. Either the number of clusters in every bootstrap if `fixed` is TRUE or the maximum number of clusters (starting from 2) to be evaluated in each bootstrap if `fixed` is FALSE.
`cqi`	A character string with the cluster quality index to be evaluated for each new partition. Any column of `as.clustrange` is supported, "CH" (the Calinski-Harabasz index) by default. Also works with `algo`= "pam".
`parallel`	A character string with the type of parallel operation to be used (if any) by the function `boot:boot`. Options are "no" (default), "multicore" and "snow" (for Windows).
`ncpus`	Integer. Number of processes to be used in case of parallel operation. Typically, one would chose this to be the number of available CPUs.
`cl`	A parallel cluster for use if `parallel` = "snow". If not supplied, a cluster on the local machine is created for the duration of the `boot` call.

Details

The regressboot function implements the following steps: (1) A random sample with replacement (i.e, bootstrap) is drawn from the data. (2) The bootstrap sample is clustered applying the exact same clustering procedure as the one used in the original analysis, which implies using the same dissimilarity measure, cluster algorithm, and method to determine the number of clusters. (3) A separate logistic regression predicting membership probability in each group is estimated. (4) The Average Marginal Effect (AME) of each covariate on the probability to be assigned to a given type is retrieved for all sequences belonging to this type. (5) These steps are repeated B times, with B typically large.

Value

The output of regressboot is a list with the following components:

`B`	The number of bootstrap (input parameter).
`optimal.kcluster`	An integer vector with the numbers of clusters for each bootstrap partition. If input parameter `fixed` is FALSE, this corresponds to the selected clustering solution based on the evaluation criterion. If input parameter `fixed` is TRUE, this can in rare cases differ from `kcluster` if two reference clusters have exactly the same estimated association with a covariate.
`cluster.solution`	A numerical matrix with the number of individuals (`nrow(data)`) as row number and the number of bootstrap (`B`) as column number. Each column correspond to the typology for this bootstrap.
`covar.name`	A character vector with the different associations evaluated in the logistic regression model (based on input parameter `formula`). This corresponds to the name of the covariate for numerical variables and the name with a specific level for factors.
`original.cluster`	A vector of the same size as the dataset with the original clustering, i.e., the one constructed on the original sample with the given method.
`original.ame`	A list with the estimated AMEs corresponding to each association between covariates of interest (as in `covar.name`) and the original typology, i.e., the one constructed on the original sample.
`bootstrap.ame`	A list with the estimated AMEs for all individuals and all bootstraps, corresponding to the associations between covariates of interest (as in `covar.name`) and the typology constructed on each bootstrap. For each covariate, the list contains a numerical matrix with the number of individuals (`nrow(data)`) as row number and the number of bootstrap (`B`) as column number.
`std.err`	A list with the estimated standard errors of the AMEs for all individuals and all bootstraps, corresponding to the associations between covariates of interest (as in `covar.name`) and the typology constructed on each bootstrap. For each covariate, the list contains a numerical matrix with the number of individuals (`nrow(data)`) as row number and the number of bootstrap (`B`) as column number.

Note

Uses the following packages: fastcluster, dplyr, margins, boot

Author(s)

Leonard Roth

References

Roth, L., Studer, M., Zuercher, E., & Peytremann-Bridevaux, I. (2024). Robustness assessment of regressions using cluster analysis typologies: a bootstrap procedure with application in state sequence analysis. BMC medical research methodology, 24(1), 303. https://doi.org/10.1186/s12874-024-02435-8.

Studer, M. (2013). WeightedCluster library manual: A practical guide to creating typologies of trajectories in the social sciences with R. University of Geneva.

Hennig, C. (2007) Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52, 258-271.

Examples


## Set the seed for reproducible results
set.seed(1)

## Load the margins library for marginal effect estimation
library(margins)

## Loading the data (TraMineR package)
data(mvad)

## Creating the state sequence object
mvad.seq <- seqdef(mvad, 17:86)

## Distance computation
diss <- seqdist(mvad.seq, method="LCS")

## Hierarchical clustering
hc <- fastcluster::hclust(as.dist(diss), method="ward.D")

## Computing cluster quality measures
clustqual <- as.clustrange(hc, diss=diss, ncluster=10)
clustqual

# Create cluster membership variable based on cluster quality above
mvad$clustering <- clustqual$clustering$cluster2
mvad$membership <- mvad$clustering == 2

# Run logistic regression model for the association between the clustering and a covariate of interest
mod <- glm(membership ~ funemp, mvad, family = "binomial")

# Model results
summary(margins(mod))

## As in the original analysis, hierarchical clustering with Ward method is implemented
## An optimal clustering solution with n between 2 and 10 is evaluated each time by
## maximizing the CH index
## For illustration purposes, the number of bootstrap is smaller than what it ought to be
bootout <- regressboot(clustering ~ funemp, mvad, diss = diss, B = 50, 
                      algo = "hierarchical", method = "ward.D", 
                      kcluster = 10)
table(bootout$optimal.kcluster)
bootout$covar.name
                        
# Robustness assessment for the association between father unemployment status
# and membership to the higher education trajectory group
result <- bootpool(bootout,  clustering = mvad$clustering, 
                  clusnb = 2, covar = "funempyes")
round(result$pooled.ame, 4)
round(result$standard.error, 4)
round(result$bootstrap.stddev, 4)

WeightedCluster documentation built on June 3, 2025, 3:01 a.m.

WeightedCluster index

Le manuel de la librairie WeightedCluster: un guide pratique pour la creation de typologies de trajectoires en sciences sociales avec R WeightedCluster Library Manual: A practical guide to creating typologies of trajectories in the social sciences with R WeightedCluster Preview R Tutorials: Robustness Assessment of Regressions using Cluster Analysis Short R Tutorial: Fuzzy and Property-Based Clustering for Sequence Analysis' Short R Tutorial: Sequence Analysis Typologies for Large Databases' Short R Tutorial: Validating Sequence Analysis Typologies To be Used in Subsequent Regression' Short R Tutorial: Validating Sequence Analysis Typologies Using Parametric Bootstrap'

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

WeightedCluster
Clustering of Weighted Data

regressboot: Bootstrap replicates of a typology and its association with...
In WeightedCluster: Clustering of Weighted Data

Bootstrap replicates of a typology and its association with covariates of interest

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to regressboot in WeightedCluster...

R Package Documentation

Browse R Packages

We want your feedback!

WeightedCluster Clustering of Weighted Data

regressboot: Bootstrap replicates of a typology and its association with... In WeightedCluster: Clustering of Weighted Data

Bootstrap replicates of a typology and its association with covariates of interest

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to regressboot in WeightedCluster...

R Package Documentation

Browse R Packages

We want your feedback!

WeightedCluster
Clustering of Weighted Data

regressboot: Bootstrap replicates of a typology and its association with...
In WeightedCluster: Clustering of Weighted Data