In YiFengEDMS/simPM: SIMulation-based power analysis for Planned Missing designs

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

In this vignette we will introduce the item-level PHPM designs. Different from wave-level PHPM designs, in item-level PHPM designs, the missingness is imposed at the item(or indicator) level. In this type of designs, subjects can be assigned to participate in a specific wave of data collection; but they may be assigned to only provide data on certain measures rather than all the repeated measures.

Compared to wave-level PHPM designs, item-level PHPM designs offer more flexibility. They especially worth consideration when there is a unit cost associated with each observed measurement.

The one potential limitation with balanced item-level PHPM dsigns is that it may result in a large number of missing data patterns. It may be difficult when only a few participants are assigned to each of the missing data patterns. It will also become infeasible to implement when the sample size is smaller than the number of unique missing data patterns.

Search for balanced item-level PHPM designs

To search for item-level PHPM designs using simPM, we only need to specify the methods = "item" argument when using the simPM() function.

Example

In this hypothetical example, Suppose a group of researchers is interested in examining the longitudinal reciprocal relations between peer relationships and teacher-child relationships. They have been funded for a longitudinal panel study following 1000 children for three years. Each year they would collect data on three measures: two measures of peer relationships (self-perceived popularity and peer-perceived popularity) and one measure of teacher-child relationships (teacher-child support). Upon the completion of data collection, they plan to fit an autoregressive cross-lagged model shown below. The parameter of focal interest to the researchers are the cross-lagged path coefficients predicting the peer-perceived popularity from the self-perceived popularity measured at the previous time point, as well as in the cross-lagged paths predicting teacher-child support from the peer-perceived popularity measured at the previous time point (marked in red).

Unfortunately, after the first wave of data collection, the funding agency announces a 30% reduction in the remaining funding. The researcher wishes to continue the project under the budget constraint, with the hope to not compromise the scientific rigor and statistical power. The reseacher thus decideds to use simPM to find a design that yields sufficient power but costs no more than the reduced budget.

knitr::include_graphics('images/Autoregressive-cross-lagged.png')

After supplying the population model and the analysis model, we can use the simPM() function to search for an optimal item-level missing design with the methods = "item" argument. For more details about the specification of other arguments, please refer to this vignette.

popModel <- '

#----------- path coefficients ------------#

SelfPop2~0.609*SelfPop1+0.111*PeerPop1
SelfPop3~0.587*SelfPop2+0.053*PeerPop2

PeerPop2~0.051*SelfPop1+0.647*PeerPop1+(-0.056)*Support1
PeerPop3~0.056*SelfPop2+0.679*PeerPop2+0.075*Support2

Support2~0.468*Support1+0.124*PeerPop1
Support3~0.280*Support2+0.132*PeerPop2

#----------- residual covariance ----------#

PeerPop1~~0.229*SelfPop1+0.064*Support1
SelfPop1~~0.007*Support1

PeerPop2~~0.079*SelfPop2+(-0.036)*Support2
SelfPop2~~0.055*Support2

PeerPop3~~0.052*SelfPop3+0.013*Support3
SelfPop3~~(-0.034)*Support3

#---------------- means -------------------#

PeerPop1~3.273*1
SelfPop1~0.048*1
Support1~2.905*1

PeerPop2~1.247*1
SelfPop2~(-0.343)*1
Support2~1.193*1

PeerPop3~0.814*1
SelfPop3~(-0.161)*1
Support3~1.451*1

#--------------- variances ----------------#

PeerPop1~~0.447*PeerPop1
SelfPop1~~1.010*SelfPop1
Support1~~0.540*Support1

PeerPop2~~0.238*PeerPop2
SelfPop2~~0.607*SelfPop2
Support2~~0.432*Support2

PeerPop3~~0.211*PeerPop3
SelfPop3~~0.380*SelfPop3
Support3~~0.424*Support3

'

analyzeModel <- '

#----------- path coefficients ------------#

SelfPop2~SelfPop1+PeerPop1
SelfPop3~SelfPop2+PeerPop2

PeerPop2~SelfPop1+PeerPop1+Support1
PeerPop3~SelfPop2+PeerPop2+Support2

Support2~Support1+PeerPop1
Support3~Support2+PeerPop2

#----------- residual covariance ----------#

PeerPop1~~SelfPop1+Support1
SelfPop1~~Support1

PeerPop2~~SelfPop2+Support2
SelfPop2~~Support2

PeerPop3~~SelfPop3+Support3
SelfPop3~~Support3


#---------------- means -------------------#

PeerPop1~1
SelfPop1~1
Support1~1

PeerPop2~1
SelfPop2~1
Support2~1

PeerPop3~1
SelfPop3~1
Support3~1


#--------------- variances ----------------#

PeerPop1~~PeerPop1
SelfPop1~~SelfPop1
Support1~~Support1

PeerPop2~~PeerPop2
SelfPop2~~SelfPop2
Support2~~Support2

PeerPop3~~PeerPop3
SelfPop3~~SelfPop3
Support3~~Support3

'

item.ex3 <- simPM(
  popModel,
  analyzeModel,
  VarNAMES=c("PeerPop1","SelfPop1","Support1","PeerPop2","SelfPop2","Support2","PeerPop3","SelfPop3","Support3"),
  distal.var = NULL,
  n=1000,
  nreps=10,
  seed=12345,
  Time=3,
  k=3,
  Time.complete=1,
  costmx=c(5,5,5,10,10,10), 
  pc=0.1,
  pd=0,
  focal.param=c("PeerPop2~SelfPop1","Support2~PeerPop1","PeerPop3~SelfPop2","Support3~PeerPop2"),
  eval.budget=T,          
  rm.budget=31500L,        # remaining available budget 45*1000*0.7
  complete.var=NULL,
  engine="l",
  methods="item"
)

By running the code above, simPM will map out the possible item-level PHPM designs and determine whether the cost of each design is within the remaining budget limit. Please note that when we specify methods = "item", simPM() will only map out the balanced item-level PHPM designs. They are balanced in the sense that each missing data pattern has the same number of missing observed measurements. For more information about imbalanced item-level PHPM designs, please refer to forward assembly.

The program will run Monte Carlo simulations for the plausible designs that cost less than the remaining amount of funding. Comparisons are made among the plausible designs. The design that yields higher empirical statistical power with regard to the focal parameters will be selected as the optimal balanced item-level PHPM design.

setwd("C:/Users/yifeng94/Desktop/simPM/simPM-git/examples")
load("item.ex3_r1.rda")
library(simPM)

In this example, given the budget constraints, there are four plausible balanced item-level PHPM designs. The program thus has run simulations for all the four plausible designs and made comparisons across the designs. The optimal design among the 4 balanced item-level missing designs will cost \$31,500, which is below the reduced available budget.

From the output, we can see that 10% of the participants are assigned to provide complete data across all the future waves of data collection. The rest 90% of the participants are randomly assigned to one of the 15 unique missing data patterns ($n=60$ in each pattern). In each missing data pattern, the participants are assigned to miss two observed indicators (items) in the future waves of data collection (e.g., self-perceived popularity at wave 2 and peer-perceived popularity at wave 3).

Over 1000 replications, this design yields an empirical power of 0.786 for testing the path coefficient $b_{PS_1}$, 0.721 for testing the path coefficient $b_{PS_2}$, 0.917 for testing the path coefficient $b_{TS_1}$, and 0.890 for testing the path coefficient $b_{TS_2}$.