knitr::opts_chunk$set( collapse = TRUE, comment = ">" ) library(dplyr) library(tidyr)
This Document describes the computational background and the use of the icc()
function from the Agree
package. We developed the icc()
functions for this package in connection with a simulation study about sample size requirements for studies on reliability and measurement error @mokkink1 and a methodological paper about how to design and conduct a study on reliability and measurement error @mokkink2.
library(Agree)
The intra-class agreement is usually obtained for continuous ratings. As an example we can use data from data study by @dikmans2017. This data is based on photographs of breasts of 50 women after breast reconstruction. The photographs are independently scored by 5 surgeons, the patients, and three mammography nurses. They each rated the quality of the reconstruction on a 5 point ordinal scale with the verbal anchors on the left side ‘very dissatisfied’ on the left end and on the right end ‘very satisfied’ on the right end. They specifically rated the volume, shape, symmetry, scars and nipple. For the icc
examples we can use the sum scores for volume, shape, symmetry, scars and nipple as an overall rating from each rater.
breast_scores <- Agree::breast %>% dplyr::select(Patient_score, PCH1_score, PCH2_score, PCH3_score, PCH4_score, PCH5_score, Mam1_score, Mam2_score, Mam3_score) head(breast_scores)
The example data shows missings. The icc
function can deal with these missings, because a mixed model is used to estimate the variances to compute the icc with.
For a mixed model, the data needs to be restructured to a long format. we can use the pivot_longer()
function from the tidyr
package to do that:
breast_long <- breast_scores %>% mutate(id = 1:nrow(breast_scores)) %>% #add id column pivot_longer(cols = -id, names_to = "rater", values_to = "score") breast_long
The variances that are used to compute the icc
are obtained from a linear mixed model. This model is estimated with the lmer()
function from the lme4
package. The model is defined as $Y_{ijr} = \beta_0 + b_{0j} + b_{0r} + \epsilon_{ijr}$, where $b_{0j}$ is the random intercept at the subject level and $b_{0r}$ the random intercept at the rater/observer level. The $\epsilon_{ijr}$ is the residual error. The r-code
for the model in lme4
is: lmer(score ~ (1|id) + (1|observer), data, REML = T)
This same model is used to estimate the variance components for each of three types of ICC's: ICC oneway, ICC agreement and ICC consistency. Each ICC is used in a different context of a study design and has different assumptions. The following variance components can be obtained directly from the model:
# Estimate model for example data icc_model2(breast_long) # Extract variance components from the icc model varcomp <- as.data.frame(lme4::VarCorr(icc_model2(breast_long))) varcomp[,c(1,4)]
There are three types of icc incorporated in the icc
function. The ICC oneway, ICC agreement and the ICC consistency.
The ICC type oneway is the variance between the subjects ($\sigma^2_j$) divided by the sum of the subject variance ($\sigma^2_j$) and the residual variance ($\sigma^2_{\epsilon}$). The $ICC_{oneway}$ is computed as follows:
$ICC_{oneway} = \frac{\sigma^2_j}{\sigma^2_j + \sigma^2_{\epsilon}}$
The ICC oneway assumes that each subject is rated by a different set of raters, that are randomly selected from a larger population of judges (@shrout1979). The icc_oneway()
uses the icc_model()
function to compute the variances. This is a lmer
model with random slope for the subjects as well as the raters. The rater variance is not separately used for the ICC oneway and is subtracted from the sum of subject variance over the raters, which is then averaged: $\sigma^{2}{j} = \frac{ k\sigma^2_j + \sigma^2_r}{k}$ The error variance is computed as the sum of the residual variance and the rater variance from the icc_model
: $\sigma^2_{\epsilon} = \sigma^2_r + \sigma^2\epsilon$ Accordingly, the rater variance is part of the error variance.
The standard error of measurement ($sem$) is the square root of this error variance (i.e. $sem = \sigma^2_{\epsilon}$). The confidence intervals are computed with the exact F method. $F = \frac{k \sigma^2_{j} + \sigma^2_{\epsilon}}{\sigma^2_{\epsilon}}$, with $df1 = n - 1$ and $df2 = n (k - 1)$ (@shrout1979).
# ICC oneway icc(breast_long, format = "long", method = "oneway") # ICC oneway with variance components icc(breast_long, format = "long", method = "oneway", var = TRUE)
The icc type agreement is the variance between the subjects ($\sigma^2_j$) divided by the sum of the subject variance ($\sigma^2_j$), rater variance ($\sigma^2_k$) and the residual variance ($\sigma^2_\epsilon$). The $ICC_{agreement}$ is computed as follows:
$ICC_{agreement} = \frac{\sigma^2_j}{\sigma^2_j + \sigma^2_k + \sigma^2_{\epsilon}}$
The ICC for agreement generalizes to other raters within a population (@shrout1979). All subjects are rated by the same set of raters, and the rater variance is taken into account in the calculation of the ICC. The variance components are computed with the icc_model()
function. This is a lmer
model with a random slope for the subjects and for the raters. The $sem$ is the square root of the sum of the rater variance and the error variance (i.e. $sem = \sqrt{\sigma^2_r + \sigma^2_\epsilon}$). The confidence intervals are approximated to account for the three independent variance components, as defined by @satter1946 & @shrout1979.
# ICC agreement icc(breast_long, format = "long", method = "agreement") # ICC oneway with variance components icc(breast_long, format = "long", method = "agreement", var = TRUE)
The ICC type consistency is the variance between the subjects ($\sigma^2_j$) divided by the sum of the subject variance ($\sigma^2_j$) and the residual variance ($\sigma^2_\epsilon$). The rater variance is separated from the subject variance and error variance, but the rater variance is not used to calculate the ICC. The rater variance can therefore also be considered as a fixed effect. The $ICC_{consistency}$ is computed as follows:
$ICC_{consistency} = \frac{\sigma^2_j}{\sigma^2_j + \sigma^2_{\epsilon}}$
The ICC for consistency generalizes only to the set of raters in the data (@shrout1979). The icc_model()
function is used to compute the variances. This is a lmer
model with a random slope for the subjects as well as for the raters. The sem is the square root of the error variance, ignoring the variance between raters. The confidence are computed with the exact F method. $F = \frac{(k \sigma^2_j + \sigma^2_\epsilon)}{\sigma^2_\epsilon}$, with $df1 = n - 1$ and $df2 = (n - 1) (k - 1)$ (@shrout1979).
# ICC consistency icc(breast_long, format = "long", method = "consistency") # ICC consistency with variance components icc(breast_long, format = "long", method = "consistency", var = TRUE)
The differences in computations between the ICC methods can quickly be seen in the variance components. We can obtain the variances by using var = TRUE
in the icc()
function, the varr
shows the variance between the raters. Only the icc agreement estimates this separately.
# ICC for all methods icc(breast_long, format = "long", var = TRUE)
When we estimate the ICC for the surgeons only, we can see that the variance at the rater level is decreased. This effect is directly shown in the ICC.
In the icc we can also use the data in wide format and use the cols
option to define the rater columns that we want to use.
# ICC for all methods icc(breast_scores, format = "wide", cols = c("PCH1_score", "PCH2_score", "PCH3_score", "PCH4_score", "PCH5_score"), var = TRUE)
When we estimate the ICC for the mammography nurses only, we see that the variance at the rater level is increased. This effect is directly shown in the ICC.
# ICC for all methods icc(breast_scores, format = "wide", cols = c("Mam1_score", "Mam2_score", "Mam3_score"), var = TRUE)
|Term |Description| |-----|------------------------------------| |$\beta_0$|Fixed intercept| |$b_{0j}$|Random intercept at subject level| |$b_{0r}$|Random intercept at rater level| |$\epsilon_{ijr}$|Residual error| |$\sigma_j$|Variance between subjects| |$\sigma_{j}$|Variance between subjects without considering rater variance| |$\sigma_r$|Variance between raters| |$\sigma_\epsilon$|Residual error variance| |$\sigma_{\epsilon}$|Residual error variance without considering rater variance| |$k$|Number of raters/observers| |$n$|Number of subjects|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.