CookDistance: Cook's distance for individual subjects

View source: R/CookDistance.R

CookDistanceR Documentation

Cook's distance for individual subjects

Description

CookDistance allows the user to identify those subjects with a greater influence in the predicted values or in the estimation of the fixed effects for the treatment group, based in the calculation of Cook's distances.

Usage

CookDistance(
  model,
  type = "fitted",
  cook_thr = NA,
  label_angle = 0,
  maxIter = 1000,
  verbose = TRUE
)

Arguments

model

An object of class "lme" representing the linear mixed-effects model fitted by lmmModel().

type

Type of Cook's distance to calculated. Possible options are fitted, to calculte Cook's distances based on the change in fitted values, or fixef to calculate Cook's distances based on the change in the fixed effects. See Details section for more information.

cook_thr

Numeric value indicating the threshold for the Cook's distance. If not specified, the threshold is set to three times the mean of the Cook's distance values.

label_angle

Numeric value indicating the angle for the label of subjects with a Cook's distance greater than cook_thr.

maxIter

Limit of maximum number of iterations for the optimization algorithm. Default to 1000.

verbose

Logical indicating if the subjects with a Cook's distance greater than cook_thr should be printed to the console.

Details

The identification of influential subjects is based on the calculation of Cook's distances. The Cook's distances can be calculated based on the change in fitted values or fixed effects.

  • Cook's distances based on the change in fitted values

When type = "fitted", the Cook's distances are calculated as the normalized change in fitted response values due to the removal of a subject from the model. Firts, a leave-one-subject-out model is fitted, removing individually each subject to fit the model. Then, the Cook's distance for subject i, (D_i), is calculated as:

D_i=\frac{\sum_{j=1}^n\Bigl(\hat{y}_{j}-\hat{y}_{j_{(-i)}}\Bigl)^2}{rank(X)\cdot MSE}

where \hat{y}_j is the j^{th} fitted response value using the complete model, and \hat{y}_{j_{(-i)}} is the j^{th} fitted response value obtained using the model where subject i has been removed.

The denominator of the expression is equal to the number of the fixed-effects coefficients, which, under the assumption that the design matrix is of full rank, is equivalent to the rank of the design matrix, and the Cook distance is normalized by the mean square error (MSE) of the model.

  • Cook's distances based on the change in fixed effects values

The identification of the subjects with a greater influence in the estimated fixed effects is based on the calculation of Cook's distances, as described in GaƂecki and Burzykowsk (2013). To compute the Cook's distance for the fixed effect estimates (i.e., the contribution to each subject to the coefficients of its treatment group), first a matrix containing the leave-one-subject-out estimates or the fixed effects is calculated. Then, the Cook's distances are calculated according to:

D_i \equiv \frac{(\hat{\beta} - \hat{\beta}_{(-i)})[\widehat{Var(\hat{\beta})}]^{-1}(\hat{\beta} - \hat{\beta}_{(-i)})}{rank(X)}

where \beta represents the vector of fixed effects and \hat{\beta}_{(-i)} is the estimate of the parameter vector \beta obtained by fitting the model to the data with the i-th subject excluded. The denominator of the expression is equal to the number of the fixed-effects coefficients, which, under the assumption that the design matrix is of full rank, is equivalent to the rank of the design matrix.

Value

A plot of the Cook's distance value for each subject, indicating those subjects whose Cook's distance is greater than cook_thr.

If saved to a variable, the function returns a vector with the Cook's distances for each subject.

References

  • Andrzej Galecki & Tomasz Burzykowski (2013) Linear Mixed-Effects Models Using R: A Step-by-Step Approach First Edition. Springer, New York. ISBN 978-1-4614-3899-1

Examples

#' # Load the example data
data(grwth_data)
# Fit the model
lmm <- lmmModel(
  data = grwth_data,
  sample_id = "subject",
  time = "Time",
  treatment = "Treatment",
  tumor_vol = "TumorVolume",
  trt_control = "Control",
  drug_a = "DrugA",
  drug_b = "DrugB",
  combination = "Combination"
  ) 
# Calulate Cook's distances for each subject
CookDistance(model = lmm)
# Change the Cook's distance threshold
CookDistance(model = lmm, cook_thr = 0.15)


SynergyLMM documentation built on Aug. 22, 2025, 5:11 p.m.