rarcat: Robustness Assessment of Regressions using Cluster Analysis...

View source: R/rarcat.R

rarcatR Documentation

Robustness Assessment of Regressions using Cluster Analysis Typologies (RARCAT)

Description

rarcat computes the entire RARCAT procedure on all possible associations between a typology and covariates of interest. See Roth et al. (2024) or the R tutorial as WeightedCluster vignette for all details on the corresponding methods and their utility.

Usage

rarcat(formula, data, diss, 
        robust=TRUE, R=500, 
        kmedoid=FALSE, hclust.method="ward.D", 
        fixed=FALSE, ncluster=10, cqi="HC",
        parallel=FALSE, progressbar=FALSE,
        fisher.transform=FALSE, 
		lmerCtrl=lme4::lmerControl())
## S3 method for class 'rarcat'
plot(x, what="AME", covar=x$factorName[1], 
		pooled.ame=TRUE, naive.ame=TRUE,  
		with.legend=TRUE, legend.prop=NA, rows=NA, 
		cols=NA, main=NULL, 
		xlab=paste(covar, "Average Marginal Effect"),
		xlim=NULL, conf.level=0.95,...)
## S3 method for class 'rarcat'
print(x, conf.level=0.95, single.row = FALSE, digits = 3, ...)
## S3 method for class 'rarcat'
summary(object, ...)

Arguments

formula

A formula object with the clustering solution on the left side and the covariates of interest on the ride side.

data

The dataset (data frame) with column names corresponding to the information in formula. The number of individuals (row number) should match the dimension of diss.

diss

The numerical dissimilarity matrix used for clustering. Only a pre-computed matrix (i.e., where pairwise dissimilarities do not depend on the resample) is currently supported.

robust

Logical. TRUE (the default) indicates that RARCAT should be performed. FALSE implies a much faster function run but only output the original analysis, which is a standard regression analysis for all combinations of reference clusters and covariates.

R

The integer number of bootstrap. Set to 500 by default to attain a satisfactory precision around the estimates as the procedure involves multiple steps.

kmedoid

The clustering algorithm as a character string. Currently only "pam" (calling the function wcKMedRange) and "hierarchical" (calling the function fastcluster::hclust) are supported. By default "pam".

hclust.method

A character string with the method argument of hclust, "ward.D" by default.

fixed

Logical. TRUE implies that the number of clusters is the same in every bootstrap. FALSE (default) implies that an optimal number of clusters is evaluated each time.

ncluster

Integer. Either the number of clusters in every bootstrap if fixed is TRUE or the maximum number of clusters (starting from 2) to be evaluated in each bootstrap if fixed is FALSE.

cqi

A character string with the cluster quality index to be evaluated for each new partition. Any column of as.clustrange is supported, "CH" (the Calinski-Harabasz index) by default. Also works with algo= "pam".

parallel

Logical. Whether to initialize the parallel processing of the future package using the default multisession strategy. If FALSE (default), then the current plan is used. If TRUE, multisession plan is initialized using default values.

progressbar

Logical. Whether to initialize a progressbar using the future package. If FALSE (default), then the current progress bar handlers is used . If TRUE, a new global progress bar handlers is initialized.

fisher.transform

Logical. TRUE means that a Fisher transformation is applied in the multilevel model estimation step. This can be recommended in case of extreme associations (close to the -1 or 1 boundaries). FALSE by default.

lmerCtrl

Control parameter for lme4 (see lmerControl

x

rarcat object to be printed or plotted.

object

rarcat object for summary (diagnostic tools).

conf.level

Confidence level for the confidence intervals. 0.95 by default.

digits

Number of significant digits to print (3 by default).

single.row

Logical. Whether to show confidence interval on the same or separate line (Default=FALSE).

what

Character. Information to plot. With "AME" (default), the boostrapped AME are shown. Set to "ranef" to view the distribution of observation-level random effect (usefull to identify potentially influential unstable observation).

covar

Character. The covariate of interest.

pooled.ame

Logical. Whether to add a vertical line and confidence interval for the pooled AME.

naive.ame

Logical. Whether to add a vertical line and confidence interval for the naive AME.

with.legend

Logical. If FALSE, the legend is not plotted.

legend.prop

Real in range [0,1]. Proportion of the graphic area devoted to the legend plot with.legend=TRUE. Default value is set according to the place (bottom or right of the graphic area) where the legend is plotted.

rows

Integers. Number of rows of the plot panel.

cols

Integers. Number of columns of the plot panel.

main

Character string. Title of the graphic.

xlab

x axis label.

xlim

Numerics. Limits of the x-axis.

...

Additionnal parameters passed to/from methods.

Details

The rarcat function runs a standard typology-based association study and evaluates the impact of sampling uncertainty on the results, thus assessing the reproducibility of the analysis.

Value

The output is a rarcat contains the following components:

arguments

A list with all the arguments passed to rarcat.

formula

The formula used in the rarcat call.

factorName

The name of the factors/covariates/coefficients used in the model.

clusterNames

The names of the clusters provided to rarcat.

clustering

The original clustering used in the rarcat formula call.

AMElist

The list of AME object, see margins for the naive analysis.

bootout

The boostrap results storing the AME and standard error per observation and clustering solutions.

pooled.ame

The pooled AME.

standard.error

The standard error of the pooled AME.

bootstrap.stddev

The estimated standard deviation of the AME between bootstraps.

observation.stddev

The estimated standard deviation of the AME between observations.

observation.ranef

The estimated observation-level random effect of the AME .

observation.stdranef

The estimated standardized observation-level random effect of the AME.

cluster.solution

The cluster solution in each bootstrap.

optimal.number

The retained number of clusters in each bootstraps.

Author(s)

Leonard Roth, Matthias Studer

References

Roth, L., Studer, M., Zuercher, E., & Peytremann-Bridevaux, I. (2024). Robustness assessment of regressions using cluster analysis typologies: a bootstrap procedure with application in state sequence analysis. BMC medical research methodology, 24(1), 303. https://doi.org/10.1186/s12874-024-02435-8.

See Also

Vignette: R Tutorials: Robustness Assessment of Regressions using Cluster Analysis Typologies

Examples

## Loading the data (TraMineR package)
data(mvad)

## Reducing sample size to speed up computations
mvad <- mvad[1:200,]


## Creating the state sequence object
mvad.seq <- seqdef(mvad[, 17:86])

## Distance computation
diss <- seqdist(mvad.seq, method="LCS")

## A six clusters solution is chosen here
mvad$clustering <- wcKMedoids(diss, k=2, cluster.only=TRUE)

## The formula should include the typology (dependent) and the covariates of interest
## As in the original analysis, hierarchical clustering with Ward method is implemented
## The number of clusters is fixed to 2 here, larger values should often be used.
## For illustration purposes, the number of bootstrap is smaller than what it ought to be
rarcatout <- rarcat(clustering ~ Grammar + gcse5eq, mvad, diss, R = 10, 
                    kmedoid=TRUE, fixed = TRUE, ncluster = 2)

## Assess the robustness of the original analysis
rarcatout

## Not run: 
## Ensure the plotting windows is large enough
## prior to running those lines.
plot(rarcatout, covar="gcse5eqyes")
plot(rarcatout, covar="gcse5eqyes", what="ranef")
summary(rarcatout)

## End(Not run)

WeightedCluster documentation built on April 27, 2026, 3:04 a.m.