fmi: Fraction of Missing Information.

View source: R/fmi.R

fmiR Documentation

Fraction of Missing Information.


This function estimates the Fraction of Missing Information (FMI) for summary statistics of each variable, using either an incomplete data set or a list of imputed data sets.


fmi(data, method = "saturated", group = NULL, ords = NULL,
  varnames = NULL, exclude = NULL, fewImps = FALSE)



Either a single data.frame with incomplete observations, or a list of imputed data sets.


character. If "saturated" or "sat" (default), the model used to estimate FMI is a freely estimated covariance matrix and mean vector for numeric variables, and/or polychoric correlations and thresholds for ordered categorical variables, for each group (if applicable). If "null", only means and variances are estimated for numeric variables, and/or thresholds for ordered categorical variables (i.e., covariances and/or polychoric correlations are constrained to zero). See Details for more information.


character. The optional name of a grouping variable, to request FMI in each group.


character. Optional vector of names of ordered-categorical variables, which are not already stored as class ordered in data.


character. Optional vector of variable names, to calculate FMI for a subset of variables in data. By default, all numeric and ordered variables will be included, unless data is a single incomplete data.frame, in which case only numeric variables can be used with FIML estimation. Other variable types will be removed.


character. Optional vector of variable names to exclude from the analysis.


logical. If TRUE, use the estimate of FMI that applies a correction to the estimated between-imputation variance. Recommended when there are few imputations; makes little difference when there are many imputations. Ignored when data is not a list of imputed data sets.


The function estimates a saturated model with lavaan for a single incomplete data set using FIML, or with lavaan.mi for a list of imputed data sets. If method = "saturated", FMI will be estiamted for all summary statistics, which could take a lot of time with big data sets. If method = "null", FMI will only be estimated for univariate statistics (e.g., means, variances, thresholds). The saturated model gives more reliable estimates, so it could also help to request a subset of variables from a large data set.


fmi returns a list with at least 2 of the following:


A list of symmetric matrices: (1) the estimated/pooled covariance matrix, or a list of group-specific matrices (if applicable) and (2) a matrix of FMI, or a list of group-specific matrices (if applicable). Only available if method = "saturated".


The estimated/pooled variance for each numeric variable. Only available if method = "null" (otherwise, it is on the diagonal of Covariances).


The estimated/pooled mean for each numeric variable.


The estimated/pooled threshold(s) for each ordered-categorical variable.


A message indicating caution when the null model is used.


Mauricio Garnier Villarreal (University of Kansas; Terrence Jorgensen (University of Amsterdam;


Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.

Savalei, V. & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19(3), 477–494. doi: 10.1080/10705511.2012.687669

Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opinion Quarterly, 74(2), 223–243. doi: 10.1093/poq/nfq007


HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)

## calculate FMI (using FIML, provide partially observed data set)
(out1 <- fmi(HSMiss, exclude = "school"))
(out2 <- fmi(HSMiss, exclude = "school", method = "null"))
(out3 <- fmi(HSMiss, varnames = c("x5","x6","x7","x8","x9")))
(out4 <- fmi(HSMiss, group = "school"))

## Not run: 
## ordered-categorical data
lapply(datCat, class)
## impose missing values
for (i in 1:8) datCat[sample(1:nrow(datCat), size = .1*nrow(datCat)), i] <- NA
## impute data m = 3 times
impout <- amelia(datCat, m = 3, noms = "g", ords = paste0("u", 1:8), p2s = FALSE)
imps <- impout$imputations
## calculate FMI, using list of imputed data sets
fmi(imps, group = "g")

## End(Not run)

semTools documentation built on May 10, 2022, 9:05 a.m.