subgroupsem: Subgroup discovery algorithms for use with structural...
In langenberg/subgroupsem: Subgroup Discovery in Structural Equation Models

View source: R/subgroupsem.R

subgroupsem

R Documentation

Subgroup discovery algorithms for use with structural equation models

Description

This function is the main function of the package and can be flexibly used to interface the python module pysubgroup for efficiently finding subgroups in structural equation models estimated by the R package lavaan.

Usage

subgroupsem(
  f_fit,
  dat,
  columns = names(dat),
  ignore = NULL,
  algorithm = "SimpleDFS",
  max_n_subgroups = 10L,
  search_depth = 3L,
  min_quality = 0,
  min_subgroup_size = NULL,
  weighting_attr = NULL,
  generalization_aware = FALSE,
  na_rm = FALSE,
  bw = NULL,
  verbose = FALSE,
  ...
)

Arguments

`f_fit`	Function to be fitted. Must take at least two arguments. `f_fit` has the signature `function(group, dat, ...)`. `group` is a numeric vector. The length of this vector equals the rows in the data frame `dat` and is to be interpreted as an additional column indicating the group assignment. `f_fit` returns the interestingness measure. Returned values should be greater than `min_quality` in case of sucess and smaller in case of failure (e.g., non-convergence, error).
`dat`	A data frame.
`columns`	Column names of the provided data frame which are to be analysed. Columns must have ordinal or nominal scale.
`ignore`	Optional argument. If `columns = NULL`, `ignore` will be used to select every column that is not in ignore.
`algorithm`	A character specifying the subgroup discovery algorithm to use. An exhaustive depth-first search is provided with 'SimpleDFS' (default) . A heuristic (non-exhaustive) Beam search is provided with 'Beam', but not yet implemented.
`max_n_subgroups`	Maximum number of subgroups. Default is 10.
`search_depth`	Maximum number of attribute combinations. Default is 3.
`min_quality`	Minimum value of interestingness measure. Values below will not be considered. Default is 0.
`min_subgroup_size`	Minimum size of a subgroup. Subgroups with sizes below will not be considered. The absolute minimum is set to 50 units, which can not be lowered. If NULL (default) the absolute minimum is applied.
`weighting_attr`	This option is deprecated.
`generalization_aware`	This option is deprecated.
`na_rm`	Boolean. Default is FALSE. If set to TRUE, cases with NA values on any column will be set to FALSE in the `sg` vector. If set to FALSE, the regarding in the `sg` vector will be also `NA`.
`bw`	Integer for beam width. Only used if algorithm is Beam search. Defaults to `max_n_subgroups`.
`verbose`	Logical. Get some information, what is going on. Defaults to `FALSE`.
`...`	Additional arguments to be passed to `f_fit`. Currently, not well implemented.

Value

List containing the time consumed and the groups.

Examples

if (FALSE) {
    model <- "
     eta1 =~ NA*x1 + x2 + x3
     eta2 =~ NA*x4 + x5 + x6
     eta3 =~ NA*x7 + x8 + x9

     eta1 ~~ 1*eta1
     eta2 ~~ 1*eta2
     eta3 ~~ 1*eta3

     eta1 + eta2 + eta3 ~ 0*1
     "

    f_fit <- function(sg, dat) {
        # Add subgroup to dataset (from logical to numeric)
        sg <- as.numeric(sg)
        dat$subgroup <- sg

        # if all participants in subgroup return -1
        if (all(sg == 1)) {
            rval <- 0
            return(rval)
        }
        rval <- tryCatch(
            {
                # Fit Model
                fit <- sem(model, data = dat, group = "subgroup")
                stopifnot(lavInspect(fit, "post.check"))

                # Compute interestingness measure
                tmp <- partable(fit)
                lam1 <- tmp$est[
                    tmp$lhs == "eta1" &
                        tmp$op == "=~" & tmp$group == 1
                ]
                lam2 <- tmp$est[
                    tmp$lhs == "eta1" &
                        tmp$op == "=~" &
                        tmp$group == 2
                ]
                difflam <- abs(lam2 - lam1)
                rval <- sum(sg, na.rm = T)^0.5 * sum(difflam)
            },
            error = function(e) -1
        )

        if (!is.numeric(rval) | length(rval) > 1) {
            rval <- -1
        }

        return(rval)
    }

    m1 <- subgroupsem(
        f_fit = f_fit,
        dat = HolzingerSwineford1939,
        columns = c("sex", "school", "grade")
    )
    summary(m1)
}

langenberg/subgroupsem documentation built on Nov. 22, 2023, 2:37 a.m.