Regularized Structural Equation Modeling

Share:

Description

Regularized Structural Equation Modeling

Usage

1
2
3
4
5
6
7
8
regsem(model, lambda = 0, alpha = 0, type = "none", data = NULL,
  optMethod = "default", gradFun = "ram", hessFun = "none",
  parallel = "no", Start = "lavaan", subOpt = "nlminb", longMod = F,
  pars_pen = NULL, diff_par = NULL, LB = -Inf, UB = Inf, block = TRUE,
  full = FALSE, calc = "normal", max.iter = 500, tol = 1e-05,
  solver = FALSE, solver.maxit = 5, alpha.inc = TRUE, step = 0.5,
  momentum = FALSE, step.ratio = FALSE, nlminb.control = list(),
  missing = "listwise")

Arguments

model

Lavaan output object. This is a model that was previously run with any of the lavaan main functions: cfa(), lavaan(), sem(), or growth(). It also can be from the efaUnrotate() function from the semTools package. Currently, the parts of the model which cannot be handled in regsem is the use of multiple group models, missing other than listwise, thresholds from categorical variable models, the use of additional estimators other than ML, most notably WLSMV for categorical variables. Note: the model does not have to actually run (use do.fit=FALSE), converge etc... regsem() uses the lavaan object as more of a parser and to get sample covariance matrix.

lambda

Penalty value. Note: higher values will result in additional convergence issues. If using values > 0.1, it is recommended to use mutli_optim() instead. See multi_optim for more detail.

alpha

Mixture for elastic net. Not currently working applied.

type

Penalty type. Options include "none", "lasso", "ridge", "enet" for the elastic net, "alasso" for the adaptive lasso, "scad, "mcp", and "diff_lasso". diff_lasso penalizes the discrepency between parameter estimates and some pre-specified values. The values to take the deviation from are specified in diff_par.

data

Optional dataframe. Only required for missing="fiml" which is not currently working.

optMethod

Solver to use. Recommended options include "nlminb" and "optimx". Note: for "optimx", the default method is to use nlminb. This can be changed in subOpt.

gradFun

Gradient function to use. Recommended to use "ram", which refers to the method specified in von Oertzen & Brick (2014). The "norm" procedure uses the forward difference method for calculating the hessian. This is slower and less accurate.

hessFun

Hessian function to use. Recommended to use "ram", which refers to the method specified in von Oertzen & Brick (2014). The "norm" procedure uses the forward difference method for calculating the hessian. This is slower and less accurate.

parallel

Logical. Whether to parallelize the processes?

Start

type of starting values to use. Only recommended to use "default". This sets factor loadings and variances to 0.5. Start = "lavaan" uses the parameter estimates from the lavaan model object. This is not recommended as it can increase the chances in getting stuck at the previous parameter estimates.

subOpt

Type of optimization to use in the optimx package.

longMod

If TRUE, the model is using longitudinal data? This changes the sample covariance used.

pars_pen

Parameter indicators to penalize. If left NULL, by default, all parameters in the A matrix outside of the intercepts are penalized when lambda > 0 and type != "none".

diff_par

Parameter values to deviate from. Only used when type="diff_lasso".

LB

lower bound vector. Note: This is very important to specify when using regularization. It greatly increases the chances of converging.

UB

Upper bound vector

block

Whether to use block coordinate descent

full

Whether to do full gradient descent or block

calc

Type of calc function to use with means or not. Not recommended for use.

max.iter

Number of iterations for coordinate descent

tol

Tolerance for coordinate descent

solver

Whether to use solver for coord_desc

solver.maxit

Max iterations for solver in coord_desc

alpha.inc

Whether alpha should increase for coord_desc

step

Step size

momentum

Logical for coord_desc

step.ratio

Ratio of step size between A and S. Logical

nlminb.control

list of control values to pass to nlminb

missing

How to handle missing data. Current options are "listwise" and "fiml". "fiml" is not currently working well.

Value

out List of return values from optimization program

convergence Convergence status. 0 = converged, 1 or 99 means the model did not converge.

par.ret Final parameter estimates

Imp_Cov Final implied covariance matrix

grad Final gradient.

KKT1 Were final gradient values close enough to 0.

KKT2 Was the final Hessian positive definite.

df Final degrees of freedom. Note that df changes with lasso penalties.

npar Final number of free parameters. Note that this can change with lasso penalties.

SampCov Sample covariance matrix.

fit Final F_ml fit. Note this is the final parameter estimates evaluated with the F_ml fit function.

coefficients Final parameter estimates

nvar Number of variables.

N sample size.

nfac Number of factors

baseline.chisq Baseline chi-square.

baseline.df Baseline degrees of freedom.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(lavaan)
HS <- data.frame(scale(HolzingerSwineford1939[,7:15]))
mod <- '
f =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9
'
# Recommended to specify meanstructure in lavaan
outt = cfa(mod,HS,meanstructure=TRUE)

fit1 <- regsem(outt,lambda=0.05,type="lasso",pars_pen=c(1:2,6:8))
#summary(fit1)