In YiFengEDMS/simPM: SIMulation-based power analysis for Planned Missing designs

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

In this vignette we will introduce the item-level PHPM designs. Different from wave-level PHPM designs, in item-level PHPM designs, the missingness is imposed at the item(or indicator) level. In this type of designs, subjects can be assigned to participate in a specific wave of data collection; but they may be assigned to only provide data on certain measures rather than all the repeated measures.

Compared to wave-level PHPM designs, item-level PHPM designs offer more flexibility. They especially worth consideration when there is a unit cost associated with each observed measurement.

The one potential limitation with balanced item-level PHPM dsigns is that it may result in a large number of missing data patterns. It may be difficult when only a few participants are assigned to each of the missing data patterns. It will also become infeasible to implement when the sample size is smaller than the number of unique missing data patterns.

Search for balanced item-level PHPM designs

To search for item-level PHPM designs using simPM, we only need to specify the methods = "item" argument when using the simPM() function.

Example

In this hypothetical example, a researcher (Mr. Y) is interested in studying the longitudinal trajectories of children's externalizing behaviors. The researcher proposed to collect reports from the mother, the father, and the teacher about the child's daily behaviors (from families with hetero-sexual parents). Once the data collection is completed, the researcher intends to model the externalizing behaviors as a latent variable and investigate its change over time.

Suppose the researcher has proposed a complete-case longitudinal study to collect data for 1,135 children across four waves, each collected at grades 1, 3, 4, and 5. The proposed analysis model is shown below, where the three indicators of the latent construct (externalizing behavior) $\eta$ across each of the four waves are the responses from the mother (M1-M4), the father (F1-F4), and the teacher (T1-T4), correspondingly.

Although the researcher has initially obtained an external funding to support his longitudinal study, unfortunately, after the first wave of data collection, the funding agency announces a 30% reduction in the remaining funding. Mr. Y wishes to continue the project but he has to deal with the lowered budget constraint. Of course, Mr. Y also wants to keep the scientific rigor and satisfactory statistical power. He thus chooses to use simPM to find a design that yields sufficient power but costs no more than the reduced budget.

knitr::include_graphics('images/Second-order-LGM.png')

After supplying the population model and the analysis model, we can use the simPM() function to search for an optimal item-level missing design with the methods = "item" argument. For more details about the specification of other arguments, please refer to this vignette.

popModel='
EXB1=~1.150*F1+0.836*T1+1*M1
EXB2=~1.150*F2+0.836*T2+1*M2
EXB3=~1.150*F3+0.836*T3+1*M3
EXB4=~1.150*F4+0.836*T4+1*M4


interc=~1*EXB1+1*EXB2+1*EXB3+1*EXB4
slope=~0*EXB1+2*EXB2+3*EXB3+4*EXB4

interc~~-0.244*slope

interc~8.289*1
slope~-0.433*1

interc~~18.184*interc
slope~~0.249*slope

EXB1~~1.084*EXB1
EXB2~~1.777*EXB2
EXB3~~1.457*EXB3
EXB4~~1.700*EXB4

T1~-0.214*1
T2~-0.214*1
T3~-0.214*1
T4~-0.214*1

M1~0*1
M2~0*1
M3~0*1
M4~0*1

F1~-1.136*1
F2~-1.136*1
F3~-1.136*1
F4~-1.136*1


M1~~23.886*M1
F1~~17.737*F1
T1~~55.074*T1 
M2~~20.223*M2      
F2~~8.941*F2      
T2~~66.698*T2      
M3~~16.905*M3      
F3~~13.922*F3      
T3~~61.995*T3 
M4~~19.324*M4 
F4~~13.410*F4      
T4~~71.127*T4

F1~~4.256*F2+7.040*F3+5.737*F4
F2~~5.440*F3+3.590*F4
F3~~6.165*F4

T1~~23.603*T2+24.666*T3+23.168*T4
T2~~35.213*T3+29.648*T4
T3~~33.815*T4

M1~~12.975*M2+11.153*M3+11.683*M4
M2~~12.219*M3+11.332*M4
M3~~11.807*M4

'

analyzeModel='

EXB1=~NA*F1+a*F1+b*T1+1*M1
EXB2=~NA*F2+a*F2+b*T2+1*M2
EXB3=~NA*F3+a*F3+b*T3+1*M3
EXB4=~NA*F4+a*F4+b*T4+1*M4

interc=~1*EXB1+1*EXB2+1*EXB3+1*EXB4
slope=~0*EXB1+2*EXB2+3*EXB3+4*EXB4

interc~~slope

interc~1
slope~1
interc~~interc
slope~~slope

EXB1~~EXB1
EXB2~~EXB2
EXB3~~EXB3
EXB4~~EXB4

F1~c*1
F2~c*1
F3~c*1
F4~c*1

T1~d*1
T2~d*1
T3~d*1
T4~d*1

M1~0*1
M2~0*1
M3~0*1
M4~0*1

F1~~F1
F2~~F2
F3~~F3
F4~~F4

T1~~T1
T2~~T2
T3~~T3
T4~~T4

M1~~M1
M2~~M2
M3~~M3
M4~~M4

F1~~F2+F3+F4
F2~~F3+F4
F3~~F4

T1~~T2+T3+T4
T2~~T3+T4
T3~~T4

M1~~M2+M3+M4
M2~~M3+M4
M3~~M4
'

item.ex2 <- simPM(
  popModel,
  analyzeModel,
  VarNAMES = c("F1","T1","M1","F2","T2","M2",
               "F3","T3","M3","F4","T4","M4"),
  distal.var = NULL,
  n = 1135,
  nreps = 1000,
  seed = 12345,
  Time = 4,
  k = 3,
  Time.complete = 1,
  costmx = c(5,5,5,10,10,10,15,15,15), 
  pc = 0.1,
  pd = 0,
  focal.param = c("interc~1",
                  "slope~1",
                  "interc~~interc",
                  "slope~~slope"),
  eval.budget = T,          
  rm.budget = 90*1135*0.7,        
  complete.var = NULL,            # specify any observed variables (items) that need complete data to be collected in the future waves
  engine = "l",
  methods = "item"                # type of PHPM designs under consideration, "item" indicates item-level missing
)

By running the code above, simPM will map out the possible item-level PHPM designs and determine whether the cost of each design is within the remaining budget limit. Please note that when we specify methods = "item", simPM() will only map out the balanced item-level PHPM designs. They are balanced in the sense that each missing data pattern has the same number of missing observed measurements. For more information about imbalanced item-level PHPM designs, please refer to forward assembly.

The program will run Monte Carlo simulations for the plausible designs that cost less than the remaining amount of funding. Comparisons are made among the plausible designs. The design that yields higher empirical statistical power with regard to the focal parameters will be selected as the optimal balanced item-level PHPM design.

setwd("C:/Users/yifeng94/Desktop/simPM/simPM-git/examples")
load("item.ex2_r1.rda")
library(simPM)

In this example, given the budget constraints, there are five plausible balanced item-level PHPM designs. The program thus has run simulations for all the five plausible designs and made comparisons across the designs. The optimal design among the 5 balanced item-level missing designs will cost \$61,830, which is below the reduced available budget.

From the output, we can see that 10% of the participants are assigned to provide complete data across all the future waves of data collection. The rest 90% of the participants are randomly assigned to one of the 126 unique missing data patterns ($n=8$ in each pattern). In each missing data pattern, the participants are assigned to miss four observed indicators (items) in the future waves of data collection (e.g., father and mother report at wave 2 as well as father and teacher report at wave 3).

Over 1000 replications, this design yields an empirical power of 1 for testing the mean intercept (interc~1), the mean slope (slope~1), and the intercept variance (interc~~interc). The empirical statistical power for testing the slope variance (slope~~slope) is 0.79.