View source: R/preprocess_MV.R
fitmv | R Documentation |
This function extends the fitme
function to fit a joint model for different responses (following possibly different response families) sharing some random-effects, including a new type of random effect defined to exhibit correlations across different responses (see mv
).
It is also possible to declare shared fixed-effect coefficients among different submodels, using the X2X
argument.
Only a few features available for analysis of univariate response may not yet work (see Details).
fitmv(submodels, data, fixed=NULL, init=list(), lower=list(), upper=list(),
control=list(), control.dist = list(), method="ML", init.HLfit=list(),
X2X=NULL, ...)
submodels |
A list of sublists each specifying a model for each univariate response. The names given to each submodel in the main list are currently ignored. The names and syntax of elements within each sublist are those of a
|
data |
A data frame containing the variables in the response and the model formulas. |
fixed |
A list of fixed values of the parameters controlling random effects. The syntax is that of the same argument in |
init , lower , upper |
Lists of initial values or bounds. The syntax is that of the same arguments in |
control |
A list of control parameters, with possible elements as described for |
control.dist |
See |
method |
Character: the fitting method to be used, such as |
init.HLfit |
See identically named |
X2X |
NULL, or a matrix M by which one can specify, as |
... |
Optional arguments passed to (or operating as if passed to) |
Matching random effects across submodels, and referring to them;
Random effects are recognized as identical across submodels by matching the formula terms. As shown in the Examples, if the two models formulas share the (1|clinic)
term, this term is recognized as a single random effect shared between the two responses. But the (1|clinic)
and (+1|clinic)
terms are recognized as distinct random effects. In that case, the init
argument init=list(lambda=c('1'=1,'2'=0.5))
is shown to refer to these by names 1,2
... where the order is defined as the order of first appearance of the terms across the model formulas in the order of the submodels
list.
Alternatively, the syntax fixed=list(lambda=c('clinic.1'=0.5,'clinic'=1))
works: this syntax makes order of input irrelevant but assumes that the user guesses names correctly (these are typically the names that appear in the summary of lambda values from the fit object or, more programmatically,
names(<fit object>$lambda.object$print_namesTerms)
). Finally, fixed values of parameters can also be specified through each sub-model, with indices referring to the order of random effects with each model.
The matching of random-effect terms occurs after expansion of multIMRF
terms, if any. This may have subtle consequences if two multIMRF terms differ only by their number of levels, as some of the expanded IMRF terms are then shared.
Capacities and limitations:
Practically all features of models that can be fitted by fitme
should be available: this includes all combinations of GLM response families, residual dispersion models, and all types of random-effect terms, whether autocorrelated or not. Among the arguments handled through the ..., covStruct
, distMatrix
, corrMatrix
should be effective; control.HLfit$LevenbergM
and verbose=c(TRACE=TRUE)
will work but some other controls available in fitme
may not. Usage of the REMLformula
argument is restricted as it cannot be used to specify a non-standard REML correction (but the more useful keepInREML
attribute for fixed fixed-effect coefficients is handled).
The multi
family-like syntax for multinomial models should not be used, but fitmv
could provide other means to model multinomial responses.
Most post-fit functions work, at least with default arguments. This includes point predict
ion and prediction variances calculations sensu lato, including with newdata
; but also simulate
, spaMM_boot
, confint
, anova
, update_resp
, and update
. The re.form
argument now works for predict
and simulate
. Bootstrap computation may require special care for models where simulation of one response variable may depend on draws of another one (see Hurdle model example in the “Gentle introdution” to spaMM,
https://gitlab.mbb.univ-montp2.fr/francois/spamm-ref/-/blob/master/vignettePlus/spaMMintro.pdf).
Prediction functions try to handle most forms of missing information in newdata
(including information missing for a residual-dispersion model when predcitiosn fro mit are needed: see Examples). As information may be missing for some submodels but not others, different numbers of predictions are then returned for different submodels. As for univariate-response models, predict
will return point predictions as a single 1-column matrix, here concatenating the prediction results of the different submodels. The nobs
attribute specifies how may values pertain to each submodel.
Some plotting functions may fail. update.formula
fails (see update_formulas
for details). terms
returns a list, which is not usable by other base R functions. stats::step
is a good example of resulting limitations, as it is currently unable to perform any sensible operation on fitmv
output. spaMM::MSFDR
which calls stats::step
likewise fails.
multcomp::glht
fails.
A perhaps not entirely satisfying feature is that simulate
by default stacks the results of simulating each submodel in a single vector. Some non-trivial reformatting may then be required to include such simulation results in a suitable newdata
data frame with (say) sufficient information for prediction of all responses. The syntax
update_resp(<fit>, newresp = simulate(<fit>, ...), evaluate = FALSE)$data
may be particularly useful to reformat simulation results in this perspective.
Which arguments belong to submodels
?:
Overall, arguments specifying individual submodels should go into submodels
, while other arguments of fitmv
should be those potentially affecting several submodels (notably, random-effect structures, lower
, and upper
) and fitting controls (such as init
and init.HLfit
). One rarely-used exception is REMLformula
which controls the fitting method but should be specified through the submodels
.
The function proceeds by first preprocessing all submodels independently, before merging the resulting information by matching random effects across submodels. The merging operation includes some checks of consistency across submodels, implying that redundant arguments may be needed across submodels (e.g. specifying twice a non-default rand.family
for a random effect shared by two submodels).
A (single) list of class HLfit
, as returned by other fitting functions in spaMM. The main difference is that it contains a families
element describing the response families, instead of the family
elements of fitted objects for univariate response.
See further examples in mv
(modelling correlated random effects over the different submodels),
and residVar
.
### Data preparation
data(clinics)
climv <- clinics
(fitClinics <- HLfit(cbind(npos,nneg)~treatment+(1|clinic),
family=binomial(),data=clinics))
set.seed(123)
climv$np2 <- simulate(fitClinics, type="residual")
#
### fits
#### Shared random effect
(mvfit <- fitmv(
submodels=list(mod1=list(formula=cbind(npos,nneg)~treatment+(1|clinic),family=binomial()),
mod2=list(formula=np2~treatment+(1|clinic),
family=poisson(), fixed=list(lambda=c("1"=1)))),
data=climv))
# Two univariate-response independent fits because random effect terms are distinct
# (note how two lambda values are set; same syntax for 'init' values):
(mvfitind <- fitmv(
submodels=list(mod1=list(formula=cbind(npos,nneg)~treatment+(1|clinic),family=binomial()),
mod2=list(formula=np2~treatment+(+1|clinic),family=poisson())),
data=climv, fixed=list(lambda=c('1'=1,'2'=0.5)))) # '1': (1|clinic); '2': (+1|clinic)
#### Specifying fixed (but not init) values in submodels is also possible (maybe not a good idea)
# (mvfitfix <- fitmv(
# submodels=list(mod1=list(formula=cbind(npos,nneg)~treatment+(1|clinic),
# family=binomial(),fixed=list(lambda=c('1'=1))), # '1': (1|clinic)
# mod2=list(formula=np2~treatment+(+1|clinic),family=poisson(),
# fixed=list(lambda=c('1'=0.5)))), # '2': (+1|clinic)
# data=climv))
#### Shared fixed effect
# Suppose we want to fit the same intercept for the two submodels
# (there may be cases where this is meaningful, even if not here).
# The original fit has four coefficients corresponding to four columns
# of fixed-effect design matrix:
head(design_X <- model.matrix(mvfit))
# (Intercept)_1 treatment_1 (Intercept)_2 treatment_2
# [1,] 1 1 0 0
# ...
# The three coefficients of the intended model are (say)
# "(Intercept)" "treatment_1" "treatment_2"
# We build a matrix that relates the original 4 coefficients to these 3 ones:
X_4to3 <-
matrix(c(1,0,0,
0,1,0,
1,0,0,
0,0,1), nrow=4, ncol=3, byrow=TRUE,
dimnames=list(NULL, c("(Intercept)","treatment_1","treatment_2")))
# defined such that design_X %*% X_4to3 will be the design matrix for the intended model,
# and the single "(Intercept)" coefficient of the three-parameter model will operate as
# a shared estimate of the "(Intercept)_1" and "(Intercept)_2" coefficients
# of the original 4-coefficients model, as intended.
# To define such matrices, it is *strongly advised* to fit the unconstrained model first,
# and to examine the structure of its model matrix (as shown above).
# The new fit is obtained by providing the matrix as the 'X2X' argument:
(mvfit3 <- fitmv(
submodels=list(mod1=list(formula=cbind(npos,nneg)~treatment+(1|clinic),family=binomial()),
mod2=list(formula=np2~treatment+(1|clinic),
family=poisson(), fixed=list(lambda=c("1"=1)))),
X2X = X_4to3,
data=climv))
# => the column names of 'X_4to3' are the fixed-effect names in all output.
#### Prediction with a residual-dispersion model
set.seed(123)
beta_dat <- data.frame(y=runif(100),grp=sample(2,100,replace = TRUE), x_het=runif(100),
y2=runif(100))
(mvfit <- fitmv(list(list(y ~1+(1|grp), family=beta_resp(), resid.model = ~x_het),
list(y2 ~1+(1|grp), family=beta_resp())),
data= beta_dat))
misspred <- beta_dat[1:3,]
misspred$x_het[1] <- NA # missing info for residual variance of first submodel
## => prediction missing when this info is needed:
#
length(predict(mvfit, newdata=misspred)) # 6 values: missing info not needed for point predictions
length(get_residVar(mvfit, newdata=misspred)) # 5 values
length(get_respVar(mvfit, newdata=misspred)) # 5 values
# Missing info not needed for predVar (**as opposed to respVar**)
length(get_predVar(mvfit, newdata=misspred)) # 6 values
#
# Same logic for interval computations:
#
dim(attr(predict(mvfit, newdata=misspred, intervals="respVar"),"intervals")) # 5,2
dim(attr(predict(mvfit, newdata=misspred, intervals="predVar"),"intervals")) # 6,2
#
# Same logic for simulate():
#
length(simulate(mvfit, newdata=misspred)) # 5 as simulation requires residVar
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.