| rarcat | R Documentation |
rarcat is a wrapper for the functions regressboot and bootpool that performs the entire RARCAT procedure on all possible associations between a typology and covariates of interest. See Roth et al. (2024) or the R tutorial as WeightedCluster vignette for all details on the corresponding methods and their utility.
rarcat(formula, data, diss,
robust=TRUE, R=500,
kmedoid=FALSE, hclust.method="ward.D",
fixed=FALSE, ncluster=10, cqi="HC",
parallel=FALSE, progressbar=FALSE,
fisher.transform=FALSE,
lmerCtrl=lme4::lmerControl())
## S3 method for class 'rarcat'
plot(x, what="AME", covar=x$factorName[1],
pooled.ame=TRUE, naive.ame=TRUE,
with.legend=TRUE, legend.prop=NA, rows=NA,
cols=NA, main=NULL,
xlab=paste(covar, "Average Marginal Effect"),
xlim=NULL, conf.level=0.95,...)
## S3 method for class 'rarcat'
print(x, conf.level=0.95, single.row = FALSE, digits = 3, ...)
## S3 method for class 'rarcat'
summary(object, ...)
formula |
A formula object with the clustering solution on the left side and the covariates of interest on the ride side. |
data |
The dataset (data frame) with column names corresponding to the information in formula. The number of individuals (row number) should match the dimension of |
diss |
The numerical dissimilarity matrix used for clustering. Only a pre-computed matrix (i.e., where pairwise dissimilarities do not depend on the resample) is currently supported. |
robust |
Logical. TRUE (the default) indicates that RARCAT should be performed. FALSE implies a much faster function run but only output the original analysis, which is a standard regression analysis for all combinations of reference clusters and covariates. |
R |
The integer number of bootstrap. Set to 500 by default to attain a satisfactory precision around the estimates as the procedure involves multiple steps. |
kmedoid |
The clustering algorithm as a character string. Currently only "pam" (calling the function |
hclust.method |
A character string with the method argument of |
fixed |
Logical. TRUE implies that the number of clusters is the same in every bootstrap. FALSE (default) implies that an optimal number of clusters is evaluated each time. |
ncluster |
Integer. Either the number of clusters in every bootstrap if |
cqi |
A character string with the cluster quality index to be evaluated for each new partition. Any column of |
parallel |
Logical. Whether to initialize the parallel processing of the |
progressbar |
Logical. Whether to initialize a progressbar using the |
fisher.transform |
Logical. TRUE means that a Fisher transformation is applied in the multilevel model estimation step. This can be recommended in case of extreme associations (close to the -1 or 1 boundaries). FALSE by default. |
lmerCtrl |
Control parameter for lme4 (see |
x |
rarcat object to be printed or plotted. |
object |
rarcat object for summary (diagnostic tools). |
conf.level |
Confidence level for the confidence intervals. 0.95 by default. |
digits |
Number of significant digits to print (3 by default). |
single.row |
Logical. Whether to show confidence interval on the same or separate line (Default=FALSE). |
what |
Character. Information to plot. With "AME" (default), the boostrapped AME are shown. Set to "ranef" to view the distribution of observation-level random effect (usefull to identify potentially influential unstable observation). |
covar |
Character. The covariate of interest. |
pooled.ame |
Logical. Whether to add a vertical line and confidence interval for the pooled AME. |
naive.ame |
Logical. Whether to add a vertical line and confidence interval for the naive AME. |
with.legend |
Logical. If |
legend.prop |
Real in range [0,1]. Proportion of the graphic area devoted to the legend plot with.legend=TRUE. Default value is set according to the place (bottom or right of the graphic area) where the legend is plotted. |
rows |
Integers. Number of rows of the plot panel. |
cols |
Integers. Number of columns of the plot panel. |
main |
Character string. Title of the graphic. |
xlab |
x axis label. |
xlim |
Numerics. Limits of the x-axis. |
... |
Additionnal parameters passed to/from methods. |
The rarcat function runs a standard typology-based association study and evaluates the impact of sampling uncertainty on the results, thus assessing the reproducibility of the analysis.
The output of rarcattables contains the following tables:
The output of bootpool is a list with the following components:
nobs |
An integer with the number of observations (i.e., number of estimated AMES from the function |
pooled.ame |
A numeric value indicating the pooled AME, which is the mean change in cluster membership probability for a change in the level of the covariate of interest over all bootstraps and all individuals belonging to the reference cluster in the original typology. |
standard.error |
Standard error of the pooled AME, which diminishes asymptotically as the number of bootstrap increases. |
bootstrap.stddev |
The estimate for the standard deviation of the bootstrap random effect. This can be used to construct a prediction interval for the association of interest (see Roth et al. 2024 for details on how to compute this). |
observation.stddev |
The estimate for the standard deviation of the bootstrap random effect. |
bootstrap.ranef |
A vector of size |
observation.ranef |
A vector of size |
original.analysis |
Average Marginal Effects (AMEs) estimated with multivariable logistic regressions and representing the expected change in the probability of belonging to a trajectory group (a reference cluster) for a change in the level of a variable (a covariate of interest), together with their confidence intervals. |
robust.analysis |
Pooled AMEs from the bootstrap procedure and their prediction intervals, representing the range of expected values if the clustering and associated regressions were performed on a new sample from the same underlying distribution. This table provide robust estimates for a typology-based association study. |
Leonard Roth
Roth, L., Studer, M., Zuercher, E., & Peytremann-Bridevaux, I. (2024). Robustness assessment of regressions using cluster analysis typologies: a bootstrap procedure with application in state sequence analysis. BMC medical research methodology, 24(1), 303. https://doi.org/10.1186/s12874-024-02435-8.
## Loading the data (TraMineR package)
data(mvad)
## Reducing sample size to speed up computations
mvad <- mvad[1:200,]
## Creating the state sequence object
mvad.seq <- seqdef(mvad[, 17:86])
## Distance computation
diss <- seqdist(mvad.seq, method="LCS")
## Hierarchical clustering
hc <- fastcluster::hclust(as.dist(diss), method="ward.D")
## Computing cluster quality measures
clustqual <- as.clustrange(hc, diss=diss, ncluster=6)
## A six clusters solution is chosen here
mvad$clustering <- clustqual$clustering$cluster2
## The formula should include the typology (dependent) and the covariates of interest
## As in the original analysis, hierarchical clustering with Ward method is implemented
## The number of clusters is fixed to 2 here, larger values should often be used.
## For illustration purposes, the number of bootstrap is smaller than what it ought to be
rarcatout <- rarcat(clustering ~ Grammar + gcse5eq, mvad, diss, R = 30,
kmedoid=TRUE, fixed = TRUE, ncluster = 2)
## Assess the robustness of the original analysis
rarcatout
#plot(rarcatout, covar="gcse5eqyes")
#plot(rarcatout, covar="gcse5eqyes", what="ranef")
#summary(rarcatout)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.