SEMSummary: Summary Statistics for a SEM Analysis
In JWiley/JWileymisc: Miscellaneous Utilities and Functions

SEMSummary

R Documentation

Summary Statistics for a SEM Analysis

Description

This function is designed to calculate the descriptive statistics and summaries that are often reported on raw data when the main analyses use structural equation modelling.

Usage

SEMSummary(
  formula,
  data,
  use = c("fiml", "pairwise.complete.obs", "complete.obs")
)

Arguments

`formula`	A formula of the variables to be used in the analysis. See the ‘details’ section for more information.
`data`	A data frame, matrix, or list containing the variables used in the formula. This is a required argument.
`use`	A character vector of how to handle missing data. Defaults to “fiml”.

Details

This function calculates a variety of relevant statistics on the raw data used in a SEM analysis. Because it is meant for SEM style data, for now it expects all variables to be numeric. In the future I may try to expand it to handle factor variables somehow.

Both the formula and data arguments are required. The formula should be the right hand side only. The most common way to use it would be with variable names separated by ‘+s’. For convenience, a ‘.’ is expanded to mean “all variables in the data set”. For a large number of variables or when whole datasets are being analyzed, this can be considerably easier to write. Also it facilitates column indexing by simply passing a subset of the data (e.g., data[, 1:10]) and using the ‘.’ expansion to analyze the first 10 columns. The examples section demonstrate this use.

Also noteworthy is that SEMSummary is not really meant to be used on its own. It is the computational workhorse, but it is meant to be used with a styling or printing method to produce simple output. APAStyler has methods for SEMSummary output.

There are several new ways to handle missing data now including listwise deletion, pairwise deletion, and using the EM algorithm, the default.

Value

A list with S3 class “SEMSummary”

`names`	A character vector containing the variable names.
`n`	An integer vector of the length of each variable used (this includes available and missing data).
`nmissing`	An integer vector of the number of missing values in each variable.
`mu`	A vector of the arithmetic means of each variable (on complete data).
`stdev`	A numeric vector of the standard deviations of each variable (on complete data).
`Sigma`	The numeric covariance matrix for all variables.
`sSigma`	The numeric correlation matrix for all variables.
`coverage`	A numeric matrix giving the percentage (technically decimal) of information available for each pairwise covariance/correlation.
`pvalue`	The two-sided p values for the correlation matrix. Pairwise present N used to calculate degrees of freedom.

Examples

## Example using the built in iris dataset
s <- SEMSummary(~ Sepal.Length + Sepal.Width + Petal.Length, data = iris)
s # show output ... not very nice

## Prettier output from SEMSummary
APAStyler(s)

#### Subset the dataset and use the . expansion ####

## summary for all variables in mtcars data set
## with 11 variables, this could be a pain to write out
SEMSummary(~ ., data = mtcars)

## . expansion is also useful when we know column positions
## but not necessarily names
SEMSummary(~ ., data = mtcars[, c(1, 2, 3, 9, 10, 11)])

## clean up
rm(s)

## sample data
Xmiss <- as.matrix(iris[, -5])
# make q0% missing completely at random
set.seed(10)
Xmiss[sample(length(Xmiss), length(Xmiss) * .10)] <- NA
Xmiss <- as.data.frame(Xmiss)

SEMSummary(~ ., data = Xmiss, use = "fiml")


## pairwise
APAStyler(SEMSummary(~ ., data = Xmiss, use = "pair"),
  type = "cor")

## same as cor()
cor(Xmiss, use = "pairwise.complete.obs")

## complete cases only
SEMSummary(~ ., data = Xmiss, use = "comp")

## clean up
rm(Xmiss)

JWiley/JWileymisc documentation built on June 9, 2025, 3:42 a.m.