fmi: Fraction of Missing Information.
In semTools: Useful Tools for Structural Equation Modeling

View source: R/fmi.R

fmi	R Documentation

Fraction of Missing Information.

Description

This function estimates the Fraction of Missing Information (FMI) for summary statistics of each variable, using either an incomplete data set or a list of imputed data sets.

Usage

fmi(data, method = "saturated", group = NULL, ords = NULL,
  varnames = NULL, exclude = NULL, return.fit = FALSE)

Arguments

`data`	Either a single `data.frame` with incomplete observations, or a `list` of imputed data sets.
`method`	character. If `"saturated"` or `"sat"` (default), the model used to estimate FMI is a freely estimated covariance matrix and mean vector for numeric variables, and/or polychoric correlations and thresholds for ordered categorical variables, for each group (if applicable). If `"null"`, only means and variances are estimated for numeric variables, and/or thresholds for ordered categorical variables (i.e., covariances and/or polychoric/polyserial correlations are constrained to zero). See Details for more information.
`group`	`character`. The optional name of a grouping variable, to request FMI in each group.
`ords`	Optional `character` vector naming ordered-categorical variables, if they are not already stored as class `ordered` in `data`.
`varnames`	Optional `character` vector of variable names, to calculate FMI for a subset of variables in `data`. By default, all numeric and `⁠ordered=⁠` variables will be included, unless `⁠data=⁠` is a single incomplete `data.frame`, in which case only numeric variables can be used with FIML estimation. Other variable types will be removed.
`exclude`	Optional `character` vector naming variables to exclude from the analysis.
`return.fit`	logical. If `TRUE`, the fitted lavaan::lavaan or lavaan.mi::lavaan.mi model is returned, so FMI can be found from `summary(..., fmi=TRUE)`.

Details

The function estimates a saturated model with lavaan::lavaan() for a single incomplete data set using FIML, or with lavaan.mi::lavaan.mi() for a list of imputed data sets. If method = "saturated", FMI will be estiamted for all summary statistics, which could take a lot of time with big data sets. If method = "null", FMI will only be estimated for univariate statistics (e.g., means, variances, thresholds). The saturated model gives more reliable estimates, so it could also help to request a subset of variables from a large data set.

Value

fmi() returns a list with at least 2 of the following:

`Covariances`	A list of symmetric matrices: (1) the estimated/pooled covariance matrix, or a list of group-specific matrices (if applicable) and (2) a matrix of FMI, or a list of group-specific matrices (if applicable). Only available if `method = "saturated"`. When `method="cor"`, this element is replaced by `Correlations`.
`Variances`	The estimated/pooled variance for each numeric variable. Only available if `method = "null"` (otherwise, it is on the diagonal of Covariances).
`Means`	The estimated/pooled mean for each numeric variable.
`Thresholds`	The estimated/pooled threshold(s) for each ordered-categorical variable.

Author(s)

Mauricio Garnier Villarreal (Vrije Universiteit Amsterdam; m.garniervillarreal@vu.nl)

Terrence Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)

References

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.

Savalei, V. & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19(3), 477–494. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10705511.2012.687669")}

Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opinion Quarterly, 74(2), 223–243. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/poq/nfq007")}

Examples


HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
                                      "ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)

## calculate FMI (using FIML, provide partially observed data set)
(out1 <- fmi(HSMiss, exclude = "school"))
(out2 <- fmi(HSMiss, exclude = "school", method = "null"))
(out3 <- fmi(HSMiss, varnames = c("x5","x6","x7","x8","x9")))
(out4 <- fmi(HSMiss, method = "cor", group = "school")) # correlations by group

## significance tests in lavaan(.mi) object
out5 <- fmi(HSMiss, method = "cor", return.fit = TRUE)
summary(out5) # factor loading == SD, covariance = correlation

if(requireNamespace("lavaan.mi")){
  ## ordered-categorical data
  data(binHS5imps, package = "lavaan.mi")

  ## calculate FMI, using list of imputed data sets
  fmi(binHS5imps, group = "school")
}

semTools documentation built on April 3, 2025, 9:23 p.m.