View source: R/model.formulas.update.r
| model.formulas.update | R Documentation |
Wrapper function to facilitate variable screening on all models generated through make.model.formulas and return updated formulas in the appropriate format for gformula.
model.formulas.update(formulas, X, screening = screen.glmnet.cramer,
with.s = FALSE, by= NA, ...)
formulas |
A named list of length 4 containing model formulas for all Y-/L-/A- and Cnodes. These are likely formulas returned from |
X |
A data frame on which the model formulas are to be evaluated. |
screening |
A screening function. Default is |
with.s |
Logical. If TRUE, a spline, i.e. s(), will be added to all continuous variables. |
by |
A character vector specifying the variables with which to multiply the smooth (if |
... |
optional arguments to be passed to the screening algorithm |
The default screening algorithm uses LASSO for variable screening (and Cramer's V for the categorized version of all variables if LASSO fails). It is possible to provide user-specific screening algorithms.
User-specific algorithms should take the data as first argument, one model formula (i.e. one entry of the list in model.formulas) as second argument and return a vector of strings, containing the variable names that remain after screening. Another screening algorithm available in the package is screen.cramersv, which categorizes all variables, calculates their association with the outcome based on Cramer's V and selects the 4 variables with strongest associations (can be changed with option nscreen).
The manual provides more information.
The fitted models of the updated models can be evaluated with fit.updated.formulas.
A list of length 4 containing the updated model formulas:
Lnames |
A vector of strings containing updated model formulas for all L nodes. |
Ynames |
A vector of strings containing updated model formulas for all Y nodes. |
Anames |
A vector of strings containing updated model formulas for all A nodes. |
Cnames |
A vector of strings containing updated model formulas for all C nodes. |
make.model.formulas, model.update, fit.updated.formulas
data(EFV)
# first: generate generic model formulas
m <- make.model.formulas(X=EFV,
Lnodes = c("adherence.1","weight.1",
"adherence.2","weight.2",
"adherence.3","weight.3",
"adherence.4","weight.4"
),
Ynodes = c("VL.0","VL.1","VL.2","VL.3","VL.4"),
Anodes = c("efv.0","efv.1","efv.2","efv.3","efv.4"),
evaluate=FALSE)
# second: update these model formulas based on variable screening with LASSO
glmnet.formulas <- model.formulas.update(m$model.names, EFV)
glmnet.formulas
# third: use these models for estimation
est <- gformula(X=EFV,
Lnodes = c("adherence.1","weight.1",
"adherence.2","weight.2",
"adherence.3","weight.3",
"adherence.4","weight.4"
),
Ynodes = c("VL.0","VL.1","VL.2","VL.3","VL.4"),
Anodes = c("efv.0","efv.1","efv.2","efv.3","efv.4"),
Yform=glmnet.formulas$Ynames, Lform=glmnet.formulas$Lnames,
abar=seq(0,2,1)
)
est
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.