knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette illustrates how to estimate a longitudinal IRT following the methodology described in Proust-Lima et al. (2022 - https://doi.org/10.1016/j.ymeth.2022.01.005). The model combines a graded response measurement model to link a series of binary or ordinal items measured repeatedly over time with their underlying latent process. Simultaneously, a linear mixed model describes the trajectory of the underlying latent process over time.
load("models_IRT.RData")
The longitudinal IRT model is estimated via multlcmm function of lcmm R package (Proust-Lima et al. 2017). The following libraries are used in this vignette:
library(lcmm) library(ggplot2) library(ggpubr) library(splines) library(gridExtra) library(dplyr)
The illustration is on a simulated dataset that mimics the PREDIALA study described and analyzed in Proust-Lima et al. (2021). The dataset is called simdataIRT. It contains the following information:
str(simdataHADS)
demo <- simdataHADS %>% group_by(ID) %>% arrange(time) %>% filter(row_number()==1) summary(demo)
The timescale of interest is the time in months spent on the waiting list: time. This timescale poses two problems:
As in many studies, entry in the study occurs for each individual at different timings in the progression of the disease. this is called delayed entry. It differs from the usual time in the study where entry corresponds to time 0. Here entry (variable time_entry) corresponds to the duration on the waiting list at inclusion, and time 0 is unlikely observed (i.e., entry in the cohort does not occur as the same time as entry on waiting list).
The timescale is continuous per se, with as many values as the number of total observations. This prevents the use of methods considering discrete timescales with measurements at the same times for all the individuals, such as classical longitudinal latent variable models.
Distribution of the time at entry and time at follow_ups
tempSim <- simdataHADS %>% group_by(ID) %>% arrange(time) %>% mutate(Visit=ifelse(time==time_entry,"Entry","Follow-up")) p <- ggplot(tempSim, aes(x=time,fill=Visit,color=Visit)) + geom_histogram(binwidth=1,aes(y=..density..)) + labs(x="Months on the waiting list") p + scale_color_grey(start = 0.1, end = 0.5)+scale_fill_grey(start = 0.1, end = 0.5) + theme_classic()
Quantiles of the distribution of measurement times
quantile(simdataHADS$time,probs=(0:10)/10)
We consider here a longitudinal IRT model with natural cubic splines on time on the waiting list to account for a possible nonlinear trajectory over time, and we adjust the trajectory for the group. We consider 2 internal knots placed at 7 and 15, and shift the right boundary at 60 due to the long tail of the distribution. In the main analysis, we estimate a model with no differential item functioning (DIF) on the group and no response shift (RS) on time.
To reduce the computation time, we first estimate a model with only 1 random-effect, a random intercept. And then, we use the estimates as initial values for the estimation of the model that also includes random-effects on the time functions. Estimation involves a numerical integration approximated by quasi Monte-Carlo (QMC) with 1000 points. This induces very intensive and long computations but was shown to give accurate results in simulations.
modIRT_i <- multlcmm(hads_2 + hads_4 +hads_6 + hads_8 +hads_10+hads_12 + hads_14 ~ ns(time,knots=c(7,15),Boundary.knots = c(0,60))*grp,random=~1,data=simdataHADS,subject="ID",link="thresholds",methInteg="QMC",nMC=1000) # use the estimates as initial values - the vector c(0,1,0,0,1,0,0,0,1) initializes the cholesky matrix of the random-effects Binit <- c(modIRT_i$best[1:7],c(0,1,0,0,1,0,0,0,1),modIRT_i$best[8:length(modIRT_i$best)]) modIRT <- multlcmm(hads_2 + hads_4 +hads_6 + hads_8 +hads_10+hads_12 + hads_14 ~ ns(time,knots=c(7,15),Boundary.knots = c(0,60))*grp,random=~1+ns(time,knots=c(7,15),Boundary.knots = c(0,60)),data=simdataHADS,subject="ID",link="thresholds",methInteg="QMC",nMC=1000, B=Binit)
The summary of the estimation:
summary(modIRT)
The predicted trajectory of the underlying process can be obtained with predictL function and the associated plot function.
datnew <- data.frame(time = seq(0,75,by=1)) datnew$grp <- 0 pIRT0 <- predictL(modIRT,datnew,var.time="time",confint = T) datnew$grp <- 1 pIRT1 <- predictL(modIRT,datnew,var.time="time",confint = T) plot(pIRT0,col=1,lwd=2,ylim=c(-1.5,1.5),legend=NULL,main="",ylab="latent depressive symptomatology",xlab="months since entry on the waiting list",type="l",bty="l",shades=T) plot(pIRT1,add=T,col=4,lwd=2,shades=T) legend(x="topleft",legend=c("dialysed","preemptive"),lty=c(1,1),col=c(1,4),lwd=2,bty="n")
To better appreciate the range of the underlying depressive symptomatology, the empirical posterior distribution can be computed
beta <- modIRT$best t <- 0:72 Z <- cbind(rep(1,length(t)),ns(t,knots=c(7,15),Boundary.knots = c(0,60))) chol <- matrix(0,ncol=4,nrow=4) chol[upper.tri(chol, diag = T)] <- c(1,beta[7:15]) library(mvtnorm) Lambda0 <- rmvnorm(10000,mean = Z%*%c(0,beta[1:3]),Z%*%t(chol)%*%chol%*%t(Z)) Lambda1 <- rmvnorm(10000,mean = Z%*%beta[4:7],Z%*%t(chol)%*%chol%*%t(Z)) Lambda <- data.frame(Lambda = as.vector(rbind(Lambda0,Lambda1))) phist <- ggplot(Lambda,aes(x=Lambda))+ geom_density(color="grey", fill="grey") + theme_bw() + xlab("underlying depressive symptomatology") +xlim(-7,7) phist
The 95\% range of the underlying distribution is approximately:
quantile(Lambda$Lambda,p=c(0.025,0.975))
The location and discrimination parameters are transformations of the estimated parameters. They are retrieved with the following code:
## Parameters nlevel <- 4 nitems <- 7 levels <- rep(nlevel,nitems) npm <- length(modIRT$best) seuils <- modIRT$best[(npm-(nlevel-1)*(nitems)+1):(npm)] err <- abs(modIRT$best[(npm-(nlevel-1)*(nitems)-(nitems-1)):(npm-(nlevel-1)*(nitems))]) seuils err # Variance Vseuils <- VarCov(modIRT)[(npm-(nlevel-1)*(nitems)+1):(npm),(npm-(nlevel-1)*(nitems)+1):(npm)] Verr <- VarCov(modIRT)[(npm-(nlevel-1)*(nitems)-(nitems-1)):(npm-(nlevel-1)*(nitems)),(npm-(nlevel-1)*(nitems)-(nitems-1)):(npm-(nlevel-1)*(nitems))]
# generic function location <- function(min,max,param,Vparam){ loc <- param[1] se <- sqrt(Vparam[1,1]) param2 <- rep(0,length(param)) param2[1] <- 1 if ((max-min)>1) { for (l in 1:(max-min-1)) { param2[l+1] <- 2*param[l+1] loc[l+1] <- loc[l] + param[1+l]^2 se[l+1] <- sqrt(t(param2) %*%Vparam %*%param2) } } return(c(loc,se)) } # application ItemLoc <- sapply(1:nitems,function(k){location(min=0,max=nlevel-1,param=seuils[((nlevel-1)*(k-1)+1):((nlevel-1)*k)],Vparam=Vseuils[((nlevel-1)*(k-1)+1):((nlevel-1)*k),((nlevel-1)*(k-1)+1):((nlevel-1)*k)])}) colnames(ItemLoc) <- paste("Item",(1:nitems)*2) ItemLoc <- ItemLoc[c(1,4,2,5,3,6),] rownames(ItemLoc) <- rep(c("Threshold","SE"),nlevel-1) discrimination <- 1/abs(err) sediscr <- diag(err^(-2))%*%Verr%*%diag(err^(-2))
The 3 thresholds and discrimination estimates of each item are:
t(rbind(ItemLoc,discrimination,Se=sqrt(diag(sediscr))))
The curve of each item category probability according to the underlying level of depressive symptomatology can be obtain usinf the ItemInfo function.
## computations info_modIRT <- ItemInfo(modIRT, lprocess=seq(-6,6,0.1)) meaning <- c("Enjoy","Laugh","Cheerful" ,"Slow" ,"Appearance" ,"Looking Forward" ,"Leisure") items <- paste("hads", seq(2,14,2), sep="_") ## automatic graph par(mfrow=c(2,4), mar=c(3,2,2,1), mgp=c(2,0.5,0)) for(k in 1:7) { plot(info_modIRT, which="LevelProb", outcome=items[k], main=paste("Item",2*k,"-",meaning[k]), lwd=2, legend=NULL) } plot(0,axes=FALSE, xlab="", ylab="", col="white") legend("center", legend=paste("Level",0:3), col=1:4, lwd=2, box.lty=0)
## graph with ggplot p <- NULL for (k in 1:7){ ICC <- info_modIRT$LevelInfo[which(info_modIRT$LevelInfo[,1]==items[k]),] p[[k]] <- (ggplot(ICC) + geom_line(aes(x = Lprocess, y = Prob, group = Level,color=Level), show.legend = F,alpha = 1,size=1.2) # + stat_smooth(aes(x = time, y = hads_scorea), method = "loess", size = 0.75) + theme_bw() + labs(title=paste("Item",2*k,"-",meaning[k])) + xlab("construct") + ylim(0,1) + ylab("Probability of item level")) } p[[8]] <- (ggplot(ICC) + geom_line(aes(x = Lprocess, y = Prob, group = Level,color=Level),alpha = 1,size=1.2) + theme_bw() ) legend <- get_legend(p[[8]]) grid.arrange(p[[1]],p[[2]],p[[3]],p[[4]],p[[5]],p[[6]],p[[7]],as_ggplot(legend),ncol=4)
The following code computes the expectation of each item according to the underlying level of depressive symptomatology. This is achieved with predictYcond function with two plot possibilities: direct plot function or ggplot
expe <- predictYcond(modIRT,lprocess = seq(-6,6,by=0.1)) # via the internal plot function plot(expe, xlab="underlying depressive symptomatology", main="Item Expectation Curves", legend=paste("Item",(1:nitems)*2), lwd=2)
# via ggplot j <- table(expe$pred$Yname)[1] expe$pred$item <- as.factor(c(rep(2,j),rep(4,j),rep(6,j),rep(8,j),rep(10,j),rep(12,j),rep(14,j))) p <- (ggplot(expe$pred) + geom_line(aes(x = Lprocess, y = Ypred, group=item,color=item),alpha = 1,size=1.2) + theme_bw() + xlab("underlying depressive symptomatology") + ylim(0,3) + ylab("Item Expectation")) p
The level of information brought by each item category (information share) and brought in total by each item is also computed by the ItemInfo function. The curves can be again plotted directly with options which="LevelInfo" and which="ItemInfo" respectively.
par(mfrow=c(2,4), mar=c(3,2,2,1), mgp=c(2,0.5,0)) for(k in 1:7) { plot(info_modIRT, which="LevelInfo", outcome=items[k], main=paste("Item",2*k,"-",meaning[k]), lwd=2, legend=NULL, ylim=c(0,1.3)) } plot(0,axes=FALSE, xlab="", ylab="", col="white") legend("center", legend=paste("Level",0:3), col=1:4, lwd=2, box.lty=0)
plot(info_modIRT, which="ItemInfo", lwd=2, legend.loc="topleft")
Item predicted trajectories according to a specific profile of covariates can be computed using predictY function:
head(datnew) datnew$grp <- 0 ns0 <- predictY(modIRT,var.time = "time",newdata=datnew,methInteg = 1,nsim=2000,draws=T) datnew$grp <- 1 ns1 <- predictY(modIRT,var.time = "time",newdata=datnew,methInteg = 1,nsim=2000,draws=T)
par(mfrow=c(2,4), mar=c(3,2,2,1), mgp=c(2,0.5,0)) for(k in 1:7){ plot(ns0,outcome = k,shades = T,ylim=c(0,3),bty="l",legend=NULL,main=paste("Item",2*k,"-",meaning[k]),ylab="Item level",xlab="months on the waiting list") plot(ns1,outcome=k,shades=T,add=T,col=2) }
DIF is programmed using contrasts (item-specific departure around the mean effect on the underlying latent process)
# initialization of the parameter vector for faster convergence npm <- length(modIRT$best) Binit <- c(modIRT$best[1:7],rep(0,(nitems-1)),modIRT$best[(npm-nlevel*nitems-9+1):npm]) # estimation modIRT_DIFg <- multlcmm(hads_2 + hads_4 +hads_6 + hads_8 +hads_10+hads_12 + hads_14 ~ ns(time,knots=c(7,15),Boundary.knots = c(0,60))*(grp) +contrast(grp),random=~1+ns(time,knots=c(7,15),Boundary.knots = c(0,60)),data=simdataHADS,subject="ID",link="thresholds",methInteg="QMC",nMC=1000,B=Binit)
sumDIF <- summary(modIRT_DIFg) sumDIF[,2]
To be done again .... L'item 2 est le seul item qui semble être différent entre les groupes (p=0.0071) avec un niveau plus faible chez les preemptive par rapport aux autres items. Au global, la différence de groupe entre les 7 items ne semble pas significative (p=0.2665 au global (Chi2 à 6 degrés de liberté)).
Global test for contrasts
WaldMult(modIRT_DIFg,pos=c(8:13))
95\% confidence interval of the difference between groups for item 2:
sum <- summary(modIRT_DIFg)[10,] c(sum[1],sum[1]- qnorm(0.975)*sum[2],sum[1]+ qnorm(0.975)*sum[2])
Response Shift is modelled by adding contrasts on the functions of time
# initialization of the parameter vector for faster convergence npm <- length(modIRT$best) Binit <- c(modIRT$best[1:7],rep(0,3*(nitems-1)),modIRT$best[(npm-nlevel*nitems-9+1):npm]) # estimation modIRT_DIFt <- multlcmm(hads_2 + hads_4 +hads_6 + hads_8 +hads_10+hads_12 + hads_14 ~ ns(time,knots=c(7,15),Boundary.knots = c(0,60))*(grp) + contrast(ns(time,knots=c(7,15),Boundary.knots = c(0,60))),random=~1+ns(time,knots=c(7,15),Boundary.knots = c(0,60)),data=simdataHADS,subject="ID",link="thresholds",methInteg="QMC",nMC=1000,B=Binit)
summary(modIRT_DIFt)
There does not seem to be any difference in item trajectories over time (see global p-values for each function of time in the summary). We can seek whether there are some difference item by item using Wald Test. This can be done with the WaldMult function of lcmm except for the last item since this parameter is a combination of the others. Next code details how to obtain this item-specific test of invariance over time.
## Pvalue for the last contrast b <- coef(modIRT_DIFt) v <- vcov(modIRT_DIFt) A <- rbind(c(rep(0,7), rep(-1,6), rep(0,49)), c(rep(0,7+6), rep(-1,6), rep(0,49-6)), c(rep(0,7+12), rep(-1,6), rep(0,49-12))) w <- t(A%*%b) %*% solve(A%*%v%*%t(A)) %*% A%*%b DIF14 <- 1-pchisq(w, df=nrow(A)) # p=0.3722833
## pvalues for all the items including the last one DIF <- cbind(seq(2,14,by=2),c(sapply(1:6,function(k){WaldMult(modIRT_DIFt,pos=c(7+k,(7+6+k),(7+2*6+k)))[2]}),DIF14)) colnames(DIF) <- c("item","pvalue") DIF
Interpretation.
head(datnew) datnew$grp <- 0 ns0DIFt <- predictY(modIRT_DIFt,var.time = "time",newdata=datnew,methInteg = 1,nsim=2000,draws=T) datnew$grp <- 1 ns1DIFt <- predictY(modIRT_DIFt,var.time = "time",newdata=datnew,methInteg = 1,nsim=2000,draws=T)
par(mfrow=c(3,3), mar=c(3,2,2,1), mgp=c(2,0.5,0)) for(k in 1:7){ plot(ns0,outcome = k,shades = T,ylim=c(0,3),bty="l",legend=NULL,main=paste("Item",2*k,"-",meaning[k]),ylab="Item level",xlab="months on the waiting list",color=1,lwd=2,xlim=c(0,50)) plot(ns0DIFt,outcome=k,shades=T,lty=2,add=T,col=1,lwd=2) plot(ns1,outcome=k,shades=T,add=T,col=4,lwd=2) plot(ns1DIFt,outcome=k,shades=T,add=T,col=4,lty=2,lwd=2) legend("top",legend=paste("(RS overall test: p = ",round(DIF[k,2],digits = 3),")",sep=""),bty="n") } plot(0,xaxt='n',yaxt='n',bty='n',pch='',ylab='',xlab='', main ='') legend(x="top",legend=c("dialysed","pre-emptive"),lty=c(1,1),col=c(1,4),lwd=2,bty="n") legend(x="bottom",legend=c("without RS","with RS"),lty=c(1,2),col=c("gray","gray"),lwd=2,bty="n")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.