fitRegModels: fitRegModels
In jhorzek/laremm: LASSO Regularization in mxModels

View source: R/fitRegModels.R

fitRegModels

R Documentation

fitRegModels

Description

Note: laremm is based on the R package regsem. Because of the early status of laremm, it is recommended to use regsem instead! fitregModels creates a regularized model from a mxModel or ctsem. It then runs this model with multiple penalty values.

Usage

fitRegModels(model, model_type = "ctsem", fitfun = "FIML",
  data_type = "raw", pen_type = "lasso", pen_on = "none",
  selectedDrifts = "none", selectedA = "none", selectedS = "none",
  pen_start = 0, pen_end = 1, pen_stepsize = 0.01,
  fit_index = "BIC", CV = FALSE, Test_Sample = NULL,
  zeroThresh = 0.001, setZero = FALSE, driftexpo = TRUE,
  DRIFT_dt = 1)

Arguments

`model`	mxModel or ctsem object
`model_type`	specify the type of model provided: ctsem or mxModel
`fitfun`	fitfunction to be used in the fitting procedure. Either FML or FIML
`data_type`	type of data in the model. Either "cov" or "raw"
`pen_on`	string vector with matrices that should be regularized. Possible are combinations of "A", "S", "DRIFT"
`selectedDrifts`	drift values to regularize. Possible are "all", "cross", "auto" or providing a matrix of the same size as the drift matrix with ones for every parameter to regularize and 0 for every non-regularized parameter
`selectedA`	A values to regularize. Possible are "all", or providing a matrix of the same size as the A matrix with ones for every parameter to regularize and 0 for every non-regularized parameter
`selectedS`	S values to regularize. Possible are "all", or providing a matrix of the same size as the S matrix with ones for every parameter to regularize and 0 for every non-regularized parameter
`pen_start`	lowest penalty value to evaluate. Recommended: 0
`pen_end`	highest penalty value to evaluate
`pen_stepsize`	increse of penalty with each iteration. e.g. if pen_start = 0, pen_end = 1, pen_stepsize = .1, fitRegModels will iterate over pen = 0, pen = .1, pen = .2, ...
`fit_index`	which fit index should be used to find the best model? Possible are AIC and BIC, CV_m2LL, CV_AIC, CV_BIC
`CV`	should a cross validation be computed? If TRUE, provide a Test_Sample
`Test_Sample`	mxData object with test sample data. Has to be of same data_type as the training data set
`zeroThresh`	threshold for evaluating regularized parameters as zero. Default is .001 similar to regsem
`setZero`	should parameters below zeroThresh be set to zero in all fit calculations. Default is FALSE, similar to regsem
`driftexpo`	specifiy if the regularization will be performed on the raw drift matrix or on the exponential of the drift matrix (discrete time parameters)
`DRIFT_dt`	provide the discrete time points for which the drift will be regularized. A vector with multiple values is possible
`penalty_type`	so far only "lasso" implemented

Author(s)

Jannik Orzek

Examples

# The following example is taken from the regsem help to demonstrate the equivalence of both methods:

library(lavaan)
library(OpenMx)
# put variables on same scale for regsem
HS <- data.frame(scale(HolzingerSwineford1939[,7:15]))

# define variables:
latent = c("f1")
manifest = c("x1","x2","x3","x4","x5", "x6", "x7", "x8", "x9")

# define paths:
loadings <- mxPath(from = latent, to = manifest, free = c(F,T,T,T,T,T,T,T,T), values = 1)
lcov <- mxPath(from = latent, arrows = 2, free = T, values = 1)
lmanif <- mxPath(from = manifest, arrows =2 , free =T, values = 1)

# define model:
myModel <- mxModel(name = "myModel", latentVars = latent, manifestVars = manifest, type = "RAM",
                   mxData(observed = HS, type = "raw"), loadings, lcov, lmanif,
                   mxPath(from = "one", to = manifest, free = T)
)

fit_myModel <- mxRun(myModel)
summary(fit_myModel)

# create regularized model:

selectedA <- matrix(0, ncol = ncol(fit_myModel$A$values), nrow = nrow(fit_myModel$A$values))
selectedA[c(2,3,7,8,9),10] <-1


reg_model <- fitRegModels(model = fit_myModel, model_type = "mxModel", fitfun = "FIML",
                          pen_on = "A", selectedA = selectedA,
                          pen_start = 0, pen_end = .05, pen_stepsize = .01
                          )
summary(reg_model)
reg_model$`fit measures`

### use laremm in ctsem ####
library(ctsem)

set.seed(12)

## define the population model:

# set the drift matrix. Note that drift eta_1_eta2 is set to equal 0 in the population.
ct_drift <- matrix(c(-.3,.2,0,-.5), ncol = 2)

generatingModel<-ctModel(Tpoints=10,n.latent=2,n.TDpred=0,n.TIpred=0,n.manifest=2,
                         MANIFESTVAR=diag(0,2),
                         LAMBDA=diag(1,2),
                         DRIFT=ct_drift,
                         DIFFUSION=matrix(c(.5,0,0,.5),2),
                         CINT=matrix(c(0,0),nrow=2),
                         T0MEANS=matrix(0,ncol=1,nrow=2),
                         T0VAR=diag(1,2))

# simulate a training data set
traindata <- ctGenerate(generatingModel,n.subjects = 100)

## Build the analysis model. Note that drift eta1_eta2 is freely estimated
# although it is 0 in the population.

myModel <- ctModel(Tpoints=10,n.latent=2,n.TDpred=0,n.TIpred=0,n.manifest=2,
                   LAMBDA=diag(1,2),
                   MANIFESTVAR=diag(0,2),
                   CINT=matrix(c(0,0),nrow=2),
                   DIFFUSION=matrix(c('eta1_eta1',0,0,'eta2_eta2'),2),
                   T0MEANS=matrix(0,ncol=1,nrow=2),
                   T0VAR=diag(1,2))

# fit the model using ctsem:
fit_myModel <- ctFit(traindata, myModel)
fit_myModel$mxobj$DRIFT$values

# regularize the model:
library(laremm)

# start regularization:
reg_myModel <- fitRegModels(model = fit_myModel, model_type = "ctsem",
                            fitfun = "FIML",data_type = "raw",
                            pen_on = "DRIFT", selectedDrifts = "cross",
                            pen_start = 0, pen_end = 1, pen_stepsize = .1)

# show the best value for penalty term (tuning parameter):
reg_myModel$`best penalty`

# show summary of parameters:
summary(reg_myModel)

#### additional Cross - validation: #####
set.seed(15)
#simulate new dataset from same population:
testdata <- ctGenerate(generatingModel,n.subjects = 100)
# note: ctsem renames the rows and columns when fitting a model. To get
# the right names, we fit the ctsem model with the new dataset and then extract
# the dataset, where rows and columns have now been renamed

fit_myModel <- ctFit(testdata, myModel, useOptimizer = F)
testdata <- fit_myModel$mxobj$data

# fit models with cross-validation:
cv_reg_myModel <- fitRegModels(model = fit_myModel, model_type = "ctsem",
                               fitfun = "FIML",data_type = "raw",pen_on = "DRIFT",
                               selectedDrifts = "cross", pen_start = 0,
                               pen_end = 1, pen_stepsize = .1, CV = TRUE,
                               Test_Sample = testdata, fit_index = "CV_BIC")
# show the summary:
summary(cv_reg_myModel)

jhorzek/laremm documentation built on Sept. 16, 2022, 12:06 p.m.