| rarcat | R Documentation |
rarcat computes the entire RARCAT procedure on all possible associations between a typology and covariates of interest. See Roth et al. (2024) or the R tutorial as WeightedCluster vignette for all details on the corresponding methods and their utility.
rarcat(formula, data, diss,
robust=TRUE, R=500,
kmedoid=FALSE, hclust.method="ward.D",
fixed=FALSE, ncluster=10, cqi="HC",
parallel=FALSE, progressbar=FALSE,
fisher.transform=FALSE,
lmerCtrl=lme4::lmerControl())
## S3 method for class 'rarcat'
plot(x, what="AME", covar=x$factorName[1],
pooled.ame=TRUE, naive.ame=TRUE,
with.legend=TRUE, legend.prop=NA, rows=NA,
cols=NA, main=NULL,
xlab=paste(covar, "Average Marginal Effect"),
xlim=NULL, conf.level=0.95,...)
## S3 method for class 'rarcat'
print(x, conf.level=0.95, single.row = FALSE, digits = 3, ...)
## S3 method for class 'rarcat'
summary(object, ...)
formula |
A formula object with the clustering solution on the left side and the covariates of interest on the ride side. |
data |
The dataset (data frame) with column names corresponding to the information in formula. The number of individuals (row number) should match the dimension of |
diss |
The numerical dissimilarity matrix used for clustering. Only a pre-computed matrix (i.e., where pairwise dissimilarities do not depend on the resample) is currently supported. |
robust |
Logical. TRUE (the default) indicates that RARCAT should be performed. FALSE implies a much faster function run but only output the original analysis, which is a standard regression analysis for all combinations of reference clusters and covariates. |
R |
The integer number of bootstrap. Set to 500 by default to attain a satisfactory precision around the estimates as the procedure involves multiple steps. |
kmedoid |
The clustering algorithm as a character string. Currently only "pam" (calling the function |
hclust.method |
A character string with the method argument of |
fixed |
Logical. TRUE implies that the number of clusters is the same in every bootstrap. FALSE (default) implies that an optimal number of clusters is evaluated each time. |
ncluster |
Integer. Either the number of clusters in every bootstrap if |
cqi |
A character string with the cluster quality index to be evaluated for each new partition. Any column of |
parallel |
Logical. Whether to initialize the parallel processing of the |
progressbar |
Logical. Whether to initialize a progressbar using the |
fisher.transform |
Logical. TRUE means that a Fisher transformation is applied in the multilevel model estimation step. This can be recommended in case of extreme associations (close to the -1 or 1 boundaries). FALSE by default. |
lmerCtrl |
Control parameter for lme4 (see |
x |
rarcat object to be printed or plotted. |
object |
rarcat object for summary (diagnostic tools). |
conf.level |
Confidence level for the confidence intervals. 0.95 by default. |
digits |
Number of significant digits to print (3 by default). |
single.row |
Logical. Whether to show confidence interval on the same or separate line (Default=FALSE). |
what |
Character. Information to plot. With "AME" (default), the boostrapped AME are shown. Set to "ranef" to view the distribution of observation-level random effect (usefull to identify potentially influential unstable observation). |
covar |
Character. The covariate of interest. |
pooled.ame |
Logical. Whether to add a vertical line and confidence interval for the pooled AME. |
naive.ame |
Logical. Whether to add a vertical line and confidence interval for the naive AME. |
with.legend |
Logical. If |
legend.prop |
Real in range [0,1]. Proportion of the graphic area devoted to the legend plot with.legend=TRUE. Default value is set according to the place (bottom or right of the graphic area) where the legend is plotted. |
rows |
Integers. Number of rows of the plot panel. |
cols |
Integers. Number of columns of the plot panel. |
main |
Character string. Title of the graphic. |
xlab |
x axis label. |
xlim |
Numerics. Limits of the x-axis. |
... |
Additionnal parameters passed to/from methods. |
The rarcat function runs a standard typology-based association study and evaluates the impact of sampling uncertainty on the results, thus assessing the reproducibility of the analysis.
The output is a rarcat contains the following components:
arguments |
A list with all the arguments passed to |
formula |
The formula used in the |
factorName |
The name of the factors/covariates/coefficients used in the model. |
clusterNames |
The names of the clusters provided to |
clustering |
The original clustering used in the |
AMElist |
The list of AME object, see margins for the naive analysis. |
bootout |
The boostrap results storing the AME and standard error per observation and clustering solutions. |
pooled.ame |
The pooled AME. |
standard.error |
The standard error of the pooled AME. |
bootstrap.stddev |
The estimated standard deviation of the AME between bootstraps. |
observation.stddev |
The estimated standard deviation of the AME between observations. |
observation.ranef |
The estimated observation-level random effect of the AME . |
observation.stdranef |
The estimated standardized observation-level random effect of the AME. |
cluster.solution |
The cluster solution in each bootstrap. |
optimal.number |
The retained number of clusters in each bootstraps. |
Leonard Roth, Matthias Studer
Roth, L., Studer, M., Zuercher, E., & Peytremann-Bridevaux, I. (2024). Robustness assessment of regressions using cluster analysis typologies: a bootstrap procedure with application in state sequence analysis. BMC medical research methodology, 24(1), 303. https://doi.org/10.1186/s12874-024-02435-8.
Vignette: R Tutorials: Robustness Assessment of Regressions using Cluster Analysis Typologies
## Loading the data (TraMineR package)
data(mvad)
## Reducing sample size to speed up computations
mvad <- mvad[1:200,]
## Creating the state sequence object
mvad.seq <- seqdef(mvad[, 17:86])
## Distance computation
diss <- seqdist(mvad.seq, method="LCS")
## A six clusters solution is chosen here
mvad$clustering <- wcKMedoids(diss, k=2, cluster.only=TRUE)
## The formula should include the typology (dependent) and the covariates of interest
## As in the original analysis, hierarchical clustering with Ward method is implemented
## The number of clusters is fixed to 2 here, larger values should often be used.
## For illustration purposes, the number of bootstrap is smaller than what it ought to be
rarcatout <- rarcat(clustering ~ Grammar + gcse5eq, mvad, diss, R = 10,
kmedoid=TRUE, fixed = TRUE, ncluster = 2)
## Assess the robustness of the original analysis
rarcatout
## Not run:
## Ensure the plotting windows is large enough
## prior to running those lines.
plot(rarcatout, covar="gcse5eqyes")
plot(rarcatout, covar="gcse5eqyes", what="ranef")
summary(rarcatout)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.