hmda.efa: Perform Exploratory Factor Analysis with HMDA
In HMDA: Holistic Multimodel Domain Analysis for Exploratory Machine Learning

hmda.efa

R Documentation

Perform Exploratory Factor Analysis with HMDA

Description

Performs exploratory factor analysis (EFA) on a specified set of features from a data frame using the psych package. The function optionally runs parallel analysis to recommend the number of factors, applies a rotation method, reverses specified features, and cleans up factor loadings by zeroing out values below a threshold. It then computes factor scores and reliability estimates, and finally returns a list containing the EFA results, cleaned loadings, reliability metrics, and factor correlations.

Usage

hmda.efa(
  df,
  features,
  algorithm = "minres",
  rotation = "promax",
  parallel.analysis = TRUE,
  nfactors = NULL,
  dict = dictionary(df, attribute = "label"),
  minimum_loadings = 0.3,
  exclude_features = NULL,
  ignore_binary = TRUE,
  intercorrelation = 0.3,
  reverse_features = NULL,
  plot = FALSE,
  factor_names = NULL,
  verbose = TRUE
)

Arguments

`df`	A data frame containing the items for EFA.
`features`	A vector of feature names (or indices) in `df` to include in the factor analysis.
`algorithm`	Character. The factor extraction method to use. Default is `"minres"`. Other methods supported by psych (e.g., "ml", "minchi") may also be used.
`rotation`	Character. The rotation method to apply to the factor solution. Default is `"promax"`.
`parallel.analysis`	Logical. If `TRUE`, runs parallel analysis using `psych::fa.parallel` to recommend the number of factors. Default is `TRUE`.
`nfactors`	Integer. The number of factors to extract. If `NULL` and `parallel.analysis = TRUE`, the number of factors recommended by the parallel analysis is used.
`dict`	A data frame dictionary with at least two columns: `"name"` and `"description"`. Used to replace feature names with human-readable labels. Default is `dictionary(df, attribute = "label")`.
`minimum_loadings`	Numeric. Any factor loading with an absolute value lower than this threshold is set to zero. Default is `0.30`.
`exclude_features`	Character vector. Features to exclude from the analysis. Default is `NULL`.
`ignore_binary`	Logical. If `TRUE`, binary items may be ignored in the analysis. Default is `TRUE`.
`intercorrelation`	Numeric. (Unused in current version) Intended to set a minimum intercorrelation threshold between items. Default is `0.3`.
`reverse_features`	A vector of feature names for which the scoring should be reversed prior to analysis. Default is `NULL`.
`plot`	Logical. If `TRUE`, a factor diagram is plotted using `psych::fa.diagram`. Default is `FALSE`.
`factor_names`	Character vector. Optional names to assign to the extracted factors (i.e., new column names for loadings).
`verbose`	Logical. If `TRUE`, the factor loadings are printed in the console.

Details

This function first checks that the number of factors is either provided or determined via parallel analysis (if parallel.analysis is TRUE). A helper function trans() is defined to reverse and standardize item scores for features specified in reverse_features. Unwanted features can be excluded via exclude_features. The EFA is then performed using psych::fa() with the chosen extraction algorithm and rotation method. Loadings are cleaned by zeroing out values below the minimum_loadings threshold, rounded, and sorted. Factor scores are computed with psych::factor.scores() and reliability is estimated using the omega() function. Finally, factor correlations are extracted from the EFA object.

Value

A list with the following components:

parallel.analysis: The output from the parallel analysis, if run.
efa: The full exploratory factor analysis object returned by psych::fa.
efa_loadings: A matrix of factor loadings after zeroing out values below the minimum_loadings threshold, rounded and sorted.
efa_reliability: The reliability results (omega) computed from the factor scores.
factor_correlations: A matrix of factor correlations, rounded to 2 decimal places.

Author(s)

E. F. Haghish

Examples

  # Example: assess feature suitability for EFA using the USJudgeRatings dataset.
  # this dataset contains ratings on several aspects of U.S. federal judges' performance.
  # Here, we check whether these rating variables are suitable for EFA.
  data("USJudgeRatings")
  features_to_check <- colnames(USJudgeRatings[,-1])
  result <- check_efa(
    df = USJudgeRatings,
    features = features_to_check,
    min_unique = 3,
    verbose = TRUE
  )

  # TRUE indicates the features are suitable.
  print(result)

HMDA documentation built on April 4, 2025, 6:06 a.m.