externVar: Estimation of a secondary regression model after the...
In lcmm: Extended Mixed Models Using Latent Classes and Latent Processes

externVar

R Documentation

Estimation of a secondary regression model after the estimation of a primary latent class model

Description

This function fits regression models to relate a latent class structure (stemmed from a latent class model estimated within lcmm package) with either an external outcome or external class predictors. Two inference techniques are implemented. They both account for the classification error in the posterior class assignment:

- a 2-stage estimation using the joint likelihood of the primary latent class model and of the secondary/ external regression;

- a conditional regression of the external outcome given the underlying latent class structure, or of the underlying class structure given external covariates.

It returns an object of one of the lcmm package classes.

Usage

externVar(
  model,
  fixed,
  mixture,
  random,
  subject,
  classmb,
  survival,
  hazard = "Weibull",
  hazardtype = "Specific",
  hazardnodes = NULL,
  TimeDepVar = NULL,
  logscale = FALSE,
  idiag = FALSE,
  nwg = FALSE,
  randomY = NULL,
  link = NULL,
  intnodes = NULL,
  epsY = NULL,
  cor = NULL,
  nsim = NULL,
  range = NULL,
  data,
  longitudinal,
  method,
  varest,
  M = 200,
  B,
  convB = 1e-04,
  convL = 1e-04,
  convG = 1e-04,
  maxiter = 100,
  posfix,
  partialH = FALSE,
  verbose = FALSE,
  nproc = 1
)

Arguments

`model`	an object inheriting from class `hlme`, `lcmm`, `Jointlcmm`, `multlcmm` or `mpjlcmm` giving the primary latent class model.
`fixed`	optional, for secondary analyses on an external outcome variable: two-sided linear formula object for specifying the outcome and fixed-effect part in the secondary model. The response outcome is on the left of `~` and the covariates are separated by `+` on the right of the `~`. The right side should be `~1` to model the outcome according to the latent classes only.
`mixture`	optional, for secondary analyses on an external outcome variable: one-sided formula object for the class-specific fixed effects in the model for the external outcome. Among the list of covariates included in fixed, the covariates with class-specific regression parameters are entered in mixture separated by `+`. By default, an intercept is included. If no intercept, `-1` should be the first term included.
`random`	optional, for secondary analyses on an external outcome variable: one-sided linear formula object for specifying the random effects in the secondary model, if appropriate. By default, no random effect is included.
`subject`	name of the covariate representing the grouping structure. Even in the absence of a hierarchical structure.
`classmb`	optional, for secondary analyses on latent class membership according to external covariates: optional one-sided formula specifying the external predictors of latent class membership to be modeled in the secondary class-membership multinomial logistic model. Covariates are separated by `+` on the right of the `~`.
`survival`	optional, for secondary analyses on an external survival outcome: two-sided formula specifying the external survival part of the model. The right side should be `~1` to get the survival associated to each latent class without any other covariate.
`hazard`	optional, for secondary analyses on an external survival outcome: family of hazard function assumed for the survival model (Weibull, piecewise or splines)
`hazardtype`	optional, for secondary analyses on an external survival outcome: indicator for the type of baseline risk function (Specific, PH or Common)
`hazardnodes`	optional, for secondary analyses on an external survival outcome: vector containing interior nodes if `splines` or `piecewise` is specified for the baseline hazard function in `hazard`
`TimeDepVar`	optional, for secondary analyses on an external survival outcome: vector specifying the name of the time-dependent covariate in the survival model (only a irreversible event time in allowed)
`logscale`	optional, for secondary analyses on an external survival outcome: boolean indicating whether an exponential (logscale=TRUE) or a square (logscale=FALSE -by default) transformation is used to ensure positivity of parameters in the baseline risk functions
`idiag`	optional, for secondary analyses on an external outcome: if appropriate, logical for the structure of the variance-covariance matrix of the random-effects in the secondary model. If `FALSE`, a non structured matrix of variance-covariance is considered (by default). If `TRUE` a diagonal matrix of variance-covariance is considered.
`nwg`	optional, for secondary analyses on an external outcome: if appropriate, logical indicating if the variance-covariance of the random-effects in the secondary model is class-specific. If `FALSE` the variance-covariance matrix is common over latent classes (by default). If `TRUE` a class-specific proportional parameter multiplies the variance-covariance matrix in each class (the proportional parameter in the last latent class equals 1 to ensure identifiability).
`randomY`	optional, for secondary analyses on an external outcome: if appropriate, logical for including an outcome-specific random intercept. If FALSE no outcome-specific random intercept is added (default). If TRUE independent outcome-specific random intercept with parameterized variance are included
`link`	optional, for secondary analyses on an external outcome: if appropriate, family of parameterized link functions for the external outcome if appropriate. Defaults to NULL, corresponding to continuous Gaussian distribution (hlme function).
`intnodes`	optional, for secondary analyses on an external outcome: if appropriate, vector of interior nodes. This argument is only required for a I-splines link function with nodes entered manually.
`epsY`	optional, for secondary analyses on an external outcome: if appropriate, definite positive real used to rescale the marker in (0,1) when the beta link function is used. By default, epsY=0.5.
`cor`	optional, for secondary analyses on an external outcome: if appropriate, indicator for inclusion of an auto correlated Gaussian process in the latent process linear (latent process) mixed model. Option "BM" indicates a brownian motion with parameterized variance. Option "AR" specifies an autoregressive process of order 1 with parameterized variance and correlation intensity. Each option should be followed by the time variable in brackets as `cor=BM(time)`. By default, no autocorrelated Gaussian process is added.
`nsim`	optional, for secondary analyses on an external outcome: if appropriate, number of points to be used in the estimated link function. By default, nsom=100.
`range`	optional, for secondary analyses on an external outcome: if appropriate, vector indicating the range of the outcomes (that is the minimum and maximum). By default, the range is defined according to the minimum and maximum observed values of the outcome. The option should be used only for Beta and Splines transformations.
`data`	Data frame containing the variables named in `fixed`, `mixture`, `random`, `classmb` and `subject`, for both the current function arguments and the primary model arguments Check `details` to get information on the data structure, especially with external outcomes.
`longitudinal`	only with `mpjlcmm` primary models and "twoStageJoint" method: mandatory list containing the longitudinal submodels used in the primary latent class model.
`method`	character indicating the inference technique to be used: `"twoStageJoint"` corresponds to 2-stage estimation using the joint log-likelihood. `"conditional"` corresponds to the conditional regression using the underlying true latent class membership.
`varest`	optional character indicating the method to be used to compute the variance of the regression estimates in the secondary regression. `"none"` does not account for the uncertainty in the primary latent class model, `"paramBoot"` computes the total variance using a parametric bootstrap technique, `"Hessian"` computes the total Hessian of the joint likelihood (implemented for `"twoStageJoint"` method only). Default to `"Hessian"` for `"twoStageJoint"` method and `"paramBoot"` for `"conditional"` method.
`M`	option integer indicating the number of draws for the parametric boostrap when `varest="paramBoot"`. Default to 200.
`B`	optional vector of initial parameter values for the secondary model. With an external outcome, the vector has the same structure as a latent class model estimated in the other functions of `lcmm` package for the same type of outcome except that no parameters should be included for the latent class membership. With external class predictors (of size p), the vector is of length (ng-1)*(1+p). If `B=NULL` (by default), internal initial values are considered
`convB`	optional threshold for the convergence criterion based on the parameter stability. By default, convB=0.0001.
`convL`	optional threshold for the convergence criterion based on the log-likelihood stability. By default, convL=0.0001.
`convG`	optional threshold for the convergence criterion based on the derivatives. By default, convG=0.0001.
`maxiter`	optional maximum number of iterations for the secondary model estimation using Marquardt iterative algorithm. Defaults to 100
`posfix`	optional vector specifying indices in parameter vector B the secondary model that should not be estimated. Default to NULL, all the parameters of the secondary regression are estimated.
`partialH`	optional logical for Piecewise and Splines baseline risk functions and Splines link functions only. Indicates whether the parameters of the baseline risk or link functions can be dropped from the Hessian matrix to define convergence criteria (can solve non convergence due to estimates at the boundary of the parameter space - usually 0).
`verbose`	logical indicating whether information about computation should be reported. Default to FALSE.
`nproc`	the number cores for parallel computation. Default to 1 (sequential mode).

Details

A. DATA STRUCTURE

The data argument must follow specific structure. It must include all the data necessary to compute the posterior classification probabilities (so a longitudinal format usually) as well as the information for the secondary analysis. For time-invariant variables in the secondary analyses: - if used as an external outcome: the information should not be duplicated at each row of the subject. It should appear once for each individual. - if used as an external covariate: the information can be duplicated at each row of the subject (as usual)

B. VARIANCE ESTIMATION

The two techniques rely on a sequential analysis (two-stage analysis) so the variance calculation should account for both the uncertainty in the first and the second stage. Not taking into account the first-stage uncertainty by specifying varest="none" may lead to the underestimation of the final variance. When possible, Method varest="Hessian" which relies on the combination of Hessians from the primary and secondary models is recommended. However, it may become numerically intensive when the primary latent class model includes a high number of parameters. As an alternative, especially when the primary model is complex and the second model includes a limited number of parameters, the parametric Bootstrap method varest="paramBoot" can be favored.

Value

an object of class externVar and externSurv for external survival outcomes, externX for external class predictors, and hlme, lcmm, or multlcmm for external longitudinal or cross-sectional outcomes.

Author(s)

Maris Dussartre, Cecile Proust-Lima and Viviane Philipps

Examples


## Not run: 


###### Estimation of the primary latent class model                   ######
# this is a linear latent class mixed model for Ydep1
# with 2 classes and a linear trajectory

set.seed(1234)
PrimMod <- hlme(Ydep1~Time,random=~Time,subject='ID',ng=1,data=data_lcmm)
PrimMod2 <- hlme(Ydep1~Time,mixture=~Time,random=~Time,subject='ID',
                 ng=2,data=data_lcmm,B=random(PrimMod))

###### Example 1: Relationship between the latent class structure and       #
#                   external class predictors                          ######
      
# We consider here 4 external predictors X1-X4.       
                  
# estimation of the secondary multinomial logistic model with total variance
# computed with the Hessian

XextHess <- externVar(PrimMod2,
                      classmb = ~X1 + X2 + X3 + X4, 
                      subject = "ID",
                      data = data_lcmm,
                      method = "twoStageJoint") 
summary(XextHess)

# estimation of a secondary multinomial logistic model with total variance
# computed with parametric Bootstrap (much longer). When planning to use
# the bootstrap estimator, we recommend running first the analysis 
# with option varest = "none" which is faster but which underestimates 
# the variance. And then use these values as plausible initial values when 
# running the estimation with varest = "paramBoot" to obtain  a valid 
# variance of the parameters. 

XextNone <- externVar(PrimMod2,
                      classmb = ~X1 + X2 + X3 + X4, 
                      subject = "ID",
                      data = data_lcmm,
                      varest = "none",
                      method = "twoStageJoint") 

XextBoot <- externVar(PrimMod2,
                      classmb = ~X1 + X2 + X3 + X4, 
                      subject = "ID",
                      data = data_lcmm,
                      varest = "paramBoot",
                      method = "twoStageJoint",
                      B = XextNone$best) 
summary(XextBoot)

 
###### Example 2: Relationship between a latent class structure and         #
#                external outcome (repeatedly measured over time)     ######
                
                
# We want to estimate a linear mixed model for Ydep2 with a linear trajectory
# adjusted on X1. 
  
# estimation of the secondary linear mixed model with total variance
# computed with the Hessian

YextHess = externVar(PrimMod2,   #primary model
                     fixed = Ydep2 ~ Time*X1,  #secondary model
                     random = ~Time, #secondary model
                     mixture = ~Time,  #secondary model
                     subject="ID",
                     data=data_lcmm,
                     method = "twoStageJoint")
                     

# estimation of a secondary linear mixed model with total variance
# computed with parametric Bootstrap (much longer). When planning to use
# the bootstrap estimator, we recommend running first the analysis 
# with option varest = "none" which is faster but which underestimates 
# the variance. And then use these values as plausible initial values when 
# running the estimation with varest = "paramBoot" to obtain  a valid 
# variance of the parameters. 

YextNone = externVar(PrimMod2,   #primary model
                     fixed = Ydep2 ~ Time*X1,  #secondary model
                     random = ~Time, #secondary model
                     mixture = ~Time,  #secondary model
                     subject="ID",
                     data=data_lcmm,
                     varest = "none",
                     method = "twoStageJoint")

YextBoot = externVar(PrimMod2,   #primary model
                     fixed = Ydep2 ~ Time*X1,  #secondary model
                     random = ~Time, #secondary model
                     mixture = ~Time,  #secondary model
                     subject="ID",
                     data=data_lcmm,
                     method = "twoStageJoint",
                     B = YextNone$best,
                     varest= "paramBoot")

summary(YextBoot) 


###### Example 3: Relationship between a latent class structure and         #
#                      external outcome (survival)                     ######

# We want to estimate a proportional hazard model (with proportional hazard 
# across classes) for time to event Tevent (indicator Event) and assuming 
# a splines baseline risk with 3 knots.

# estimation of the secondary survival model with total variance
# computed with the Hessian

YextHess = externVar(PrimMod2,   #primary model
                     survival = Surv(Tevent,Event)~ X1+mixture(X2), #secondary model
                     hazard="3-quant-splines", #secondary model
                     hazardtype="PH", #secondary model
                     subject="ID",
                     data=data_lcmm,
                     method = "twoStageJoint")
summary(YextHess)


# estimation of a secondary survival model with total variance
# computed with parametric Bootstrap (much longer). When planning to use
# the bootstrap estimator, we recommend running first the analysis 
# with option varest = "none" which is faster but which underestimates 
# the variance. And then use these values as plausible initial values when 
# running the estimation with varest = "paramBoot" to obtain  a valid 
# variance of the parameters. 

YextNone = externVar(PrimMod2,   #primary model
                     survival = Surv(Tevent,Event)~ X1+mixture(X2), #secondary model
                     hazard="3-quant-splines", #secondary model
                     hazardtype="PH", #secondary model
                     subject="ID",
                     data=data_lcmm,
                     varest = "none",
                     method = "twoStageJoint")

YextBoot = externVar(PrimMod2,   #primary model
                     survival = Surv(Tevent,Event)~ X1+mixture(X2), #secondary model
                     hazard="3-quant-splines", #secondary model
                     hazardtype="PH", #secondary model
                     subject="ID",
                     data=data_lcmm,
                     method = "twoStageJoint",
                     B = YextNone$best,
                     varest= "paramBoot")

summary(YextBoot)


## End(Not run)

lcmm documentation built on April 3, 2025, 8:54 p.m.