subgroupsem: Subgroup discovery algorithms for use with structural...

View source: R/subgroupsem.R

subgroupsemR Documentation

Subgroup discovery algorithms for use with structural equation models

Description

This function is the main function of the package and can be flexibly used to interface the python module pysubgroup for efficiently finding subgroups in structural equation models estimated by the R package lavaan.

Usage

subgroupsem(
  f_fit,
  dat,
  columns = names(dat),
  ignore = NULL,
  algorithm = "SimpleDFS",
  max_n_subgroups = 10L,
  search_depth = 3L,
  min_quality = 0,
  min_subgroup_size = NULL,
  weighting_attr = NULL,
  generalization_aware = FALSE,
  na_rm = FALSE,
  bw = NULL,
  verbose = FALSE,
  ...
)

Arguments

f_fit

Function to be fitted. Must take at least two arguments. f_fit has the signature function(group, dat, ...). group is a numeric vector. The length of this vector equals the rows in the data frame dat and is to be interpreted as an additional column indicating the group assignment. f_fit returns the interestingness measure. Returned values should be greater than min_quality in case of sucess and smaller in case of failure (e.g., non-convergence, error).

dat

A data frame.

columns

Column names of the provided data frame which are to be analysed. Columns must have ordinal or nominal scale.

ignore

Optional argument. If columns = NULL, ignore will be used to select every column that is not in ignore.

algorithm

A character specifying the subgroup discovery algorithm to use. An exhaustive depth-first search is provided with 'SimpleDFS' (default) . A heuristic (non-exhaustive) Beam search is provided with 'Beam', but not yet implemented.

max_n_subgroups

Maximum number of subgroups. Default is 10.

search_depth

Maximum number of attribute combinations. Default is 3.

min_quality

Minimum value of interestingness measure. Values below will not be considered. Default is 0.

min_subgroup_size

Minimum size of a subgroup. Subgroups with sizes below will not be considered. The absolute minimum is set to 50 units, which can not be lowered. If NULL (default) the absolute minimum is applied.

weighting_attr

This option is deprecated.

generalization_aware

This option is deprecated.

na_rm

Boolean. Default is FALSE. If set to TRUE, cases with NA values on any column will be set to FALSE in the sg vector. If set to FALSE, the regarding in the sg vector will be also NA.

bw

Integer for beam width. Only used if algorithm is Beam search. Defaults to max_n_subgroups.

verbose

Logical. Get some information, what is going on. Defaults to FALSE.

...

Additional arguments to be passed to f_fit. Currently, not well implemented.

Value

List containing the time consumed and the groups.

Examples

if (FALSE) {
    model <- "
     eta1 =~ NA*x1 + x2 + x3
     eta2 =~ NA*x4 + x5 + x6
     eta3 =~ NA*x7 + x8 + x9

     eta1 ~~ 1*eta1
     eta2 ~~ 1*eta2
     eta3 ~~ 1*eta3

     eta1 + eta2 + eta3 ~ 0*1
     "

    f_fit <- function(sg, dat) {
        # Add subgroup to dataset (from logical to numeric)
        sg <- as.numeric(sg)
        dat$subgroup <- sg

        # if all participants in subgroup return -1
        if (all(sg == 1)) {
            rval <- 0
            return(rval)
        }
        rval <- tryCatch(
            {
                # Fit Model
                fit <- sem(model, data = dat, group = "subgroup")
                stopifnot(lavInspect(fit, "post.check"))

                # Compute interestingness measure
                tmp <- partable(fit)
                lam1 <- tmp$est[
                    tmp$lhs == "eta1" &
                        tmp$op == "=~" & tmp$group == 1
                ]
                lam2 <- tmp$est[
                    tmp$lhs == "eta1" &
                        tmp$op == "=~" &
                        tmp$group == 2
                ]
                difflam <- abs(lam2 - lam1)
                rval <- sum(sg, na.rm = T)^0.5 * sum(difflam)
            },
            error = function(e) -1
        )

        if (!is.numeric(rval) | length(rval) > 1) {
            rval <- -1
        }

        return(rval)
    }

    m1 <- subgroupsem(
        f_fit = f_fit,
        dat = HolzingerSwineford1939,
        columns = c("sex", "school", "grade")
    )
    summary(m1)
}

langenberg/subgroupsem documentation built on Nov. 22, 2023, 2:37 a.m.