poolMAlloc: Combine sampling variability with parcel-allocation...
In semTools: Useful Tools for Structural Equation Modeling

poolMAlloc

R Documentation

Combine sampling variability with parcel-allocation variability by pooling results across M parcel-allocations

Description

This function employs an iterative algorithm to pick the number of random item-to-parcel allocations needed to meet user-defined stability criteria for a fitted structural equation model (SEM) (see Details below for more information). Pooled point and standard-error estimates from this SEM can be outputted at this final selected number of allocations (however, it is more efficient to save the allocations and treat them as multiple imputations using lavaan.mi::lavaan.mi(); see See Also for links with examples). Additionally, new indices (see Sterba & Rights, 2016) are outputted for assessing the relative contributions of parcel-allocation variability vs. sampling variability in each estimate. At each iteration, this function generates a given number of random item-to-parcel allocations, fits a SEM to each allocation, pools estimates across allocations from that iteration, and then assesses whether stopping criteria are met. If stopping criteria are not met, the algorithm increments the number of allocations used (generating all new allocations).

Usage

poolMAlloc(nPerPar, facPlc, nAllocStart, nAllocAdd = 0,
  parceloutput = NULL, syntax, dataset, stopProp, stopValue,
  selectParam = NULL, indices = "default", double = FALSE,
  checkConv = FALSE, names = "default", leaveout = 0,
  useTotalAlloc = FALSE, ...)

Arguments

`nPerPar`	A list in which each element is a vector, corresponding to each factor, indicating sizes of parcels. If variables are left out of parceling, they should not be accounted for here (i.e., there should not be parcels of size "1").
`facPlc`	A list of vectors, each corresponding to a factor, specifying the item indicators of that factor (whether included in parceling or not). Either variable names or column numbers. Variables not listed will not be modeled or included in output datasets.
`nAllocStart`	The number of random allocations of items to parcels to generate in the first iteration of the algorithm.
`nAllocAdd`	The number of allocations to add with each iteration of the algorithm. Note that if only one iteration is desired, `nAllocAdd` can be set to `0` and results will be output for `nAllocStart` allocations only.
`parceloutput`	Optional `character`. Path (folder/directory) where M (the final selected number of allocations) parceled data sets will be outputted from the iteration where the algorithm met stopping criteria. Note for Windows users: file path must be specified using forward slashes (`/`), not backslashes (`⁠\\⁠`). See `base::path.expand()` for details. If `NULL` (default), nothing is saved to disk.
`syntax`	lavaan syntax that defines the model.
`dataset`	Item-level dataset
`stopProp`	Value used in defining stopping criteria of the algorithm (`\delta_a` in Sterba & Rights, 2016). This is the minimum proportion of change (in any pooled parameter or pooled standard error estimate listed in `selectParam`) that is allowable from one iteration of the algorithm to the next. That is, change in pooled estimates and pooled standard errors from one iteration to the next must all be less than (`stopProp`) `\times` (value from former iteration). Note that `stopValue` can override this criterion (see below). Also note that values less than .01 are unlikely to lead to more substantively meaningful precision. Also note that if only `stopValue` is a desired criterion, `stopProp` can be set to 0.
`stopValue`	Value used in defining stopping criteria of the algorithm (`\delta_b` in Sterba & Rights, 2016). `stopValue` is a minimum allowable amount of absolute change (in any pooled parameter or pooled standard error estimate listed in `selectParam`) from one iteration of the algorithm to the next. For a given pooled estimate or pooled standard error, `stopValue` is only invoked as a stopping criteria when the minimum change required by `stopProp` is less than `stopValue`. Note that values less than .01 are unlikely to lead to more substantively meaningful precision. Also note that if only `stopProp` is a desired criterion, `stopValue` can be set to 0.
`selectParam`	(Optional) A list of the pooled parameters to be used in defining stopping criteria (i.e., `stopProp` and `stopValue`). These parameters should appear in the order they are listed in the lavaan syntax. By default, all pooled parameters are used. Note that `selectParam` should only contain freely-estimated parameters. In one example from Sterba & Rights (2016) `selectParam` included all free parameters except item intercepts and in another example `selectParam` included only structural parameters.
`indices`	Optional `character` vector indicating the names of available `lavaan::fitMeasures()` to be included in the output. The first and second elements should be a chi-squared test statistic and its associated degrees of freedom, both of which will be added if missing. If `"default"`, the indices will be `c("chisq", "df", "cfi", "tli", "rmsea","srmr")`. If a robust test statistic is requested (see `lavaan::lavOptions()`), `c("chisq","df")` will be replaced by `c("chisq.scaled","df.scaled")`. For the output to include both the naive and robust test statistics, `indices` should include both, but put the scaled test statistics first, as in `indices = c("chisq.scaled", "df.scaled", "chisq", "df")`
`double`	(Optional) If set to `TRUE`, requires stopping criteria (`stopProp` and `stopValue`) to be met for all parameters (in `selectParam`) for two consecutive iterations of the algorithm. By default, this is set to `FALSE`, meaning stopping criteria need only be met at one iteration of the algorithm.
`checkConv`	(Optional) If set to TRUE, function will output pooled estimates and standard errors from 10 iterations post-convergence.
`names`	(Optional) A character vector containing the names of parceled variables.
`leaveout`	(Optional) A vector of variables to be left out of randomized parceling. Either variable names or column numbers are allowed.
`useTotalAlloc`	(Optional) If set to `TRUE`, function will output a separate set of results that uses all allocations created by the algorithm, rather than M allocations (see "Allocations needed for stability" below). This distinction is further discussed in Sterba and Rights (2016).
`...`	Additional arguments to be passed to `lavaan::lavaan()`. See also `lavaan::lavOptions()`

Details

This function implements an algorithm for choosing the number of allocations (M; described in Sterba & Rights, 2016), pools point and standard-error estimates across these M allocations, and produces indices for assessing the relative contributions of parcel-allocation variability vs. sampling variability in each estimate.

To obtain pooled test statistics for model fit or model comparison, the list or parcel allocations can be passed to lavaan.mi::lavaan.mi() (find Examples on the help pages for parcelAllocation() and PAVranking()).

This function randomly generates a given number (nAllocStart) of item-to-parcel allocations, fits a SEM to each allocation, and then increments the number of allocations used (by nAllocAdd) until the pooled point and standard-error estimates fulfill stopping criteria (stopProp and stopValue, defined above). A summary of results from the model that was fit to the M allocations are returned.

Additionally, this function outputs the proportion of allocations with solutions that converged (using a maximum likelihood estimator) as well as the proportion of allocations with solutions that were converged and proper. The converged and proper solutions among the final M allocations are used in computing pooled results.

Additionally, after each iteration of the algorithm, information useful in monitoring the algorithm is outputted. The number of allocations used at that iteration, the proportion of pooled parameter estimates meeting stopping criteria at the previous iteration, the proportion of pooled standard errors meeting stopping criteria at the previous iteration, and the runtime of that iteration are outputted. When stopping criteria are satisfied, the full set of results are outputted.

For further details on the benefits of the random allocation of items to parcels, see Sterba (2011) and Sterba & MacCallum (2010).

Value

`Estimates`	A table containing pooled results across M allocations at the iteration where stopping criteria were met. Columns correspond to individual parameter name, pooled estimate, pooled standard error, p value for a z test of the parameter, normal-theory `95\%` CI, p value for a t test of the parameter (using `df` described in Sterba & Rights, 2016), and t-based `95\%` CI for the parameter.
`Fit`	A table containing results related to model fit from the M allocations at the iteration where stopping criteria were met. Columns correspond to fit index names, the mean of each index across allocations, the SD of each fit index across allocations, the minimum, maximum and range of each fit index across allocations, and the percent of the M allocations where the chi-square test of absolute fit was significant.
`Proportions`	A table containing the proportion of the final M allocations that (a) met the optimizer convergence criteria) and (b) converged to proper solutions. Note that pooled estimates, pooled standard errors, and other results are computed using only the converged, proper allocations.
`Stability`	The number of allocations (M) needed for stability, at which point the algorithm's stopping criteria (defined above) were met.
`Uncertainty`	Indices used to quantify uncertainty in estimates due to sample vs. allocation variability. A table containing individual parameter names, an estimate of the proportion of total variance of a pooled parameter estimate that is attributable to parcel-allocation variability (PPAV), and an estimate of the ratio of the between-allocation variance of a pooled parameter estimate to the within-allocation variance (RPAV). See Sterba & Rights (2016) for more detail.
`Time`	The total runtime of the function, in minutes. Note that the total runtime will be greater when the specified model encounters convergence problems for some allocations, as is the case with the `simParcel()` dataset used below.

Author(s)

Jason D. Rights (Vanderbilt University; jason.d.rights@vanderbilt.edu)

The author would also like to credit Corbin Quick and Alexander Schoemann for providing the original parcelAllocation() function (prior to its revision by Terrence D. Jorgensen) on which this function is based.

References

Sterba, S. K. (2011). Implications of parcel-allocation variability for comparing fit of item-solutions and parcel-solutions. Structural Equation Modeling, 18(4), 554–577. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10705511.2011.607073")}

Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across random allocations of items to parcels. Multivariate Behavioral Research, 45(2), 322–358. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00273171003680302")}

Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51(2–3), 296–313. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00273171.2016.1144502")}

Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model selection: Parcel-allocation variability in model ranking. Psychological Methods, 22(1), 47–68. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/met0000067")}

Examples



## lavaan syntax: A 2 Correlated
## factor CFA model to be fit to parceled data

parmodel <- '
   f1 =~ NA*p1f1 + p2f1 + p3f1
   f2 =~ NA*p1f2 + p2f2 + p3f2
   p1f1 ~ 1
   p2f1 ~ 1
   p3f1 ~ 1
   p1f2 ~ 1
   p2f2 ~ 1
   p3f2 ~ 1
   p1f1 ~~ p1f1
   p2f1 ~~ p2f1
   p3f1 ~~ p3f1
   p1f2 ~~ p1f2
   p2f2 ~~ p2f2
   p3f2 ~~ p3f2
   f1 ~~ 1*f1
   f2 ~~ 1*f2
   f1 ~~ f2
'

## specify items for each factor
f1name <- colnames(simParcel)[1:9]
f2name <- colnames(simParcel)[10:18]

## run function
poolMAlloc(nPerPar = list(c(3,3,3), c(3,3,3)),
           facPlc = list(f1name, f2name), nAllocStart = 10, nAllocAdd = 10,
           syntax = parmodel, dataset = simParcel, stopProp = .03,
           stopValue = .03, selectParam = c(1:6, 13:18, 21),
           names = list("p1f1","p2f1","p3f1","p1f2","p2f2","p3f2"),
           double = FALSE, useTotalAlloc = FALSE)


## See examples on ?parcelAllocation and ?PAVranking for how to obtain
## pooled test statistics and other pooled lavaan output.
## Details provided in Sterba & Rights (2016).

semTools documentation built on April 3, 2025, 9:23 p.m.