poolMAlloc: Pooled estimates and standard errors across M...

View source: R/poolMAlloc.R

poolMAllocR Documentation

Pooled estimates and standard errors across M parcel-allocations: Combining sampling variability and parcel-allocation variability.

Description

This function employs an iterative algorithm to pick the number of random item-to-parcel allocations needed to meet user-defined stability criteria for a fitted structural equation model (SEM) (see Details below for more information). Pooled point and standard-error estimates from this SEM can be outputted at this final selected number of allocations (however, it is more efficient to save the allocations and treat them as multiple imputations using runMI; see See Also for links with examples). Additionally, new indices (see Sterba & Rights, 2016) are outputted for assessing the relative contributions of parcel-allocation variability vs. sampling variability in each estimate. At each iteration, this function generates a given number of random item-to-parcel allocations, fits a SEM to each allocation, pools estimates across allocations from that iteration, and then assesses whether stopping criteria are met. If stopping criteria are not met, the algorithm increments the number of allocations used (generating all new allocations).

Usage

poolMAlloc(nPerPar, facPlc, nAllocStart, nAllocAdd = 0,
  parceloutput = NULL, syntax, dataset, stopProp, stopValue,
  selectParam = NULL, indices = "default", double = FALSE,
  checkConv = FALSE, names = "default", leaveout = 0,
  useTotalAlloc = FALSE, ...)

Arguments

nPerPar

A list in which each element is a vector, corresponding to each factor, indicating sizes of parcels. If variables are left out of parceling, they should not be accounted for here (i.e., there should not be parcels of size "1").

facPlc

A list of vectors, each corresponding to a factor, specifying the item indicators of that factor (whether included in parceling or not). Either variable names or column numbers. Variables not listed will not be modeled or included in output datasets.

nAllocStart

The number of random allocations of items to parcels to generate in the first iteration of the algorithm.

nAllocAdd

The number of allocations to add with each iteration of the algorithm. Note that if only one iteration is desired, nAllocAdd can be set to 0 and results will be output for nAllocStart allocationsonly.

parceloutput

Optional character. Path (folder/directory) where M (the final selected number of allocations) parceled data sets will be outputted from the iteration where the algorithm met stopping criteria. Note for Windows users: file path must be specified using forward slashes (/), not backslashes (\). See path.expand for details. If NULL (default), nothing is saved to disk.

syntax

lavaan syntax that defines the model.

dataset

Item-level dataset

stopProp

Value used in defining stopping criteria of the algorithm (δ_a in Sterba & Rights, 2016). This is the minimum proportion of change (in any pooled parameter or pooled standard error estimate listed in selectParam) that is allowable from one iteration of the algorithm to the next. That is, change in pooled estimates and pooled standard errors from one iteration to the next must all be less than (stopProp) x (value from former iteration). Note that stopValue can override this criterion (see below). Also note that values less than .01 are unlikely to lead to more substantively meaningful precision. Also note that if only stopValue is a desired criterion, stopProp can be set to 0.

stopValue

Value used in defining stopping criteria of the algorithm (δ_b in Sterba & Rights, 2016). stopValue is a minimum allowable amount of absolute change (in any pooled parameter or pooled standard error estimate listed in selectParam) from one iteration of the algorithm to the next. For a given pooled estimate or pooled standard error, stopValue is only invoked as a stopping criteria when the minimum change required by stopProp is less than stopValue. Note that values less than .01 are unlikely to lead to more substantively meaningful precision. Also note that if only stopProp is a desired criterion, stopValue can be set to 0.

selectParam

(Optional) A list of the pooled parameters to be used in defining stopping criteria (i.e., stopProp and stopValue). These parameters should appear in the order they are listed in the lavaan syntax. By default, all pooled parameters are used. Note that selectParam should only contain freely-estimated parameters. In one example from Sterba & Rights (2016) selectParam included all free parameters except item intercepts and in another example selectParam included only structural parameters.

indices

Optional character vector indicating the names of available fitMeasures to be included in the output. The first and second elements should be a chi-squared test statistic and its associated degrees of freedom, both of which will be added if missing. If "default", the indices will be c("chisq", "df", "cfi", "tli", "rmsea","srmr"). If a robust test statistic is requested (see lavOptions), c("chisq","df") will be replaced by c("chisq.scaled","df.scaled"). For the output to include both the naive and robust test statistics, indices should include both, but put the scaled test statistics first, as in indices = c("chisq.scaled", "df.scaled", "chisq", "df")

double

(Optional) If set to TRUE, requires stopping criteria (stopProp and stopValue) to be met for all parameters (in selectParam) for two consecutive iterations of the algorithm. By default, this is set to FALSE, meaning stopping criteria need only be met at one iteration of the algorithm.

checkConv

(Optional) If set to TRUE, function will output pooled estimates and standard errors from 10 iterations post-convergence.

names

(Optional) A character vector containing the names of parceled variables.

leaveout

(Optional) A vector of variables to be left out of randomized parceling. Either variable names or column numbers are allowed.

useTotalAlloc

(Optional) If set to TRUE, function will output a separate set of results that uses all allocations created by the algorithm, rather than M allocations (see "Allocations needed for stability" below). This distinction is further discussed in Sterba and Rights (2016).

...

Additional arguments to be passed to lavaan. See also lavOptions

Details

For further details on the benefits of the random allocation of items to parcels, see Sterba (2011) and Sterba & MacCallum (2010).

This function implements an algorithm for choosing the number of allocations (M; described in Sterba & Rights, 2016), pools point and standard-error estimates across these M allocations, and produces indices for assessing the relative contributions of parcel-allocation variability vs. sampling variability in each estimate.

To obtain pooled test statistics for model fit or model comparison, the list or parcel allocations can be passed to runMI (find Examples on the help pages for parcelAllocation and PAVranking).

This function randomly generates a given number (nAllocStart) of item-to-parcel allocations, fits a SEM to each allocation, and then increments the number of allocations used (by nAllocAdd) until the pooled point and standard-error estimates fulfill stopping criteria (stopProp and stopValue, defined above). A summary of results from the model that was fit to the M allocations are returned.

Additionally, this function outputs the proportion of allocations with solutions that converged (using a maximum likelihood estimator) as well as the proportion of allocations with solutions that were converged and proper. The converged and proper solutions among the final M allocations are used in computing pooled results.

Additionally, after each iteration of the algorithm, information useful in monitoring the algorithm is outputted. The number of allocations used at that iteration, the proportion of pooled parameter estimates meeting stopping criteria at the previous iteration, the proportion of pooled standard errors meeting stopping criteria at the previous iteration, and the runtime of that iteration are outputted. When stopping criteria are satisfied, the full set of results are outputted.

Value

Estimates

A table containing pooled results across M allocations at the iteration where stopping criteria were met. Columns correspond to individual parameter name, pooled estimate, pooled standard error, p-value for a z-test of the parameter, z-based 95% confidence interval, p-value for a t-test of the parameter (using degrees of freedom described in Sterba & Rights, 2016), and t-based 95% confidence interval for the parameter.

Fit

A table containing results related to model fit from the M allocations at the iteration where stopping criteria were met. Columns correspond to fit index names, the average of each index across allocations, the standard deviation of each fit index across allocations, the maximum of each fit index across allocations, the minimum of each fit index across allocations, the range of each fit index across allocations, and the percent of the M allocations where the chi-square test of absolute fit was significant.

Proportion of converged and proper allocations

A table containing the proportion of the final M allocations that converged (using a maximum likelihood estimator) and the proportion of allocations that converged to proper solutions. Note that pooled estimates, pooled standard errors, and other results are computed using only the converged, proper allocations.

Allocations needed for stability (M)

The number of allocations (M) at which the algorithm's stopping criteria (defined above) were met.

Indices used to quantify uncertainty in estimates due to sample vs. allocation variability

A table containing individual parameter names, an estimate of the proportion of total variance of a pooled parameter estimate that is attributable to parcel-allocation variability (PPAV), and an estimate of the ratio of the between-allocation variance of a pooled parameter estimate to the within-allocation variance (RPAV). See Sterba & Rights (2016) for more detail.

Total runtime (minutes)

The total runtime of the function, in minutes. Note that the total runtime will be greater when the specified model encounters convergence problems for some allocations, as is the case with the simParcel dataset used below.

Author(s)

Jason D. Rights (Vanderbilt University; jason.d.rights@vanderbilt.edu)

The author would also like to credit Corbin Quick and Alexander Schoemann for providing the original parcelAllocation function on which this function is based.

References

Sterba, S. K. (2011). Implications of parcel-allocation variability for comparing fit of item-solutions and parcel-solutions. Structural Equation Modeling, 18(4), 554–577. doi: 10.1080/10705511.2011.607073

Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across random allocations of items to parcels. Multivariate Behavioral Research, 45(2), 322–358. doi: 10.1080/00273171003680302

Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51(2–3), 296–313. doi: 10.1080/00273171.2016.1144502

Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model selection: Parcel-allocation variability in model ranking. Psychological Methods, 22(1), 47–68. doi: 10.1037/met0000067

See Also

runMI for treating allocations as multiple imputations to pool results across allocations. See Examples on help pages for:

  • parcelAllocation for fitting a single model

  • PAVranking for comparing 2 models

Examples


## Not run: 
## lavaan syntax: A 2 Correlated
## factor CFA model to be fit to parceled data

parmodel <- '
   f1 =~ NA*p1f1 + p2f1 + p3f1
   f2 =~ NA*p1f2 + p2f2 + p3f2
   p1f1 ~ 1
   p2f1 ~ 1
   p3f1 ~ 1
   p1f2 ~ 1
   p2f2 ~ 1
   p3f2 ~ 1
   p1f1 ~~ p1f1
   p2f1 ~~ p2f1
   p3f1 ~~ p3f1
   p1f2 ~~ p1f2
   p2f2 ~~ p2f2
   p3f2 ~~ p3f2
   f1 ~~ 1*f1
   f2 ~~ 1*f2
   f1 ~~ f2
'

## specify items for each factor
f1name <- colnames(simParcel)[1:9]
f2name <- colnames(simParcel)[10:18]

## run function
poolMAlloc(nPerPar = list(c(3,3,3), c(3,3,3)),
           facPlc = list(f1name, f2name), nAllocStart = 10, nAllocAdd = 10,
           syntax = parmodel, dataset = simParcel, stopProp = .03,
           stopValue = .03, selectParam = c(1:6, 13:18, 21),
           names = list("p1f1","p2f1","p3f1","p1f2","p2f2","p3f2"),
           double = FALSE, useTotalAlloc = FALSE)

## End(Not run)

## See examples on ?parcelAllocation and ?PAVranking for how to obtain
## pooled test statistics and other pooled lavaan output.
## Details provided in Sterba & Rights (2016).


semTools documentation built on May 10, 2022, 9:05 a.m.