merlin: merlin - Mixed Effects Regression for Linear, Nonlinear and...

Description Usage Arguments Author(s) References See Also Examples

View source: R/merlin.R

Description

merlin fits linear, non-linear and user-defined mixed effects regression models. merlin can fit multivariate outcome models of any type, each of which could be repeatedly measured (longitudinal), with any number of levels, and with any number of random effects at each level. Standard distributions/models available include the Bernoulli, Gaussian, Poisson, beta, negative-binomial, and time-to-event/survival models include the exponential, Gompertz, Weibull, Royston-Parmar, and general hazard model. merlin provides a flexible predictor syntax, allowing the user to define variables, random effects, spline and fractional polynomial functions, functions of other outcome models, and any interaction between each of them. Non-linear and time-dependent effects are seamlessly incorporated into the predictor. merlin allows multivariate normal random effects, which are integrated out using Gaussian quadrature or Monte-Carlo integration. Relative survival (excess hazard) models are supported. Utility functions are provided to allow user-defined models to be specified, in conjunction with the complex predictor.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
merlin(
  model,
  from = NULL,
  family = "gaussian",
  link = NULL,
  timevar = NULL,
  covariance = "diagonal",
  data,
  userf = NULL,
  sweights = NULL,
  levels = NULL,
  predict = FALSE,
  predtype = NULL,
  predmodel = NULL,
  causes = NULL,
  at = NULL,
  contrast = NULL,
  modelfit = NULL,
  control = list()
)

Arguments

model

specify the fixed and random elements for each model outcome. Where there are multiple outcomes, the models should be specified in a list. Each model should be specified as a formula (e.g. y~x). A number of different element types can be specified, including

  • varname - an independent variable from the data set

  • random-effects - a random-effect at the cluster level can be specified using M#[cluster level], for example M1[id] would define a random intercept at the ID level. Each independent random-effect should be given a unique name, if two random-effects are given the same name they will be treated as shared random-effects.

  • rcs() - restricted cubic spline terms, this option can be used to include a restricted cubic spline function, with the degrees of freedom (number of spline terms) specified using the df sub-option, with the boundary knots assumed to be at the minimum and maximum of the variable, with internal knots placed at equally spaced centiles. Other default options orthog = TRUE, which by default orthogonalises the splines, log = FALSE, which can be used to calculate splines of the log of the variable and event = F, which can be used to calculate the internal knots based only on observations that experienced the event of interest (for survival models).

  • srcs() is a shorthand element, equivalent to rcs(..., log = TRUE, event = TRUE), for use with survival models.

  • fp() - fractional polynomials of order 1 or 2 can be specified, the sub-option powers is used to specify the powers of the fp model.

  • EV[depvar] - the expected value of the response of a submodel.

  • dEV[depvar] - the first derivative with respect to time of the expected value of the response of a submodel.

  • d2EV[depvar] - the second derivative with respect to time of the expected value of the response of a submodel.

  • iEV[depvar] - the integral with respect to time of the expected value of the response of a submodel.

  • XB[depvar] - the expected value of the complex predictor of a submodel.

  • dXB[depvar] - the first derivative with respect to time of the expected value of the complex predictor of a submodel.

  • d2XB[depvar] - the second derivative with respect to time of the expected value of the complex predictor of a submodel.

  • iXB[depvar] - the integral with respect to time of the expected value of the complex predictor of a submodel.

  • bhazard(varname) - invokes a relative survival (excess hazard) model. varname specifies the expected hazard rate at the event time.

  • exposure(varname) - include log(varname) in the linear predictor, with a coefficient of 1. For use with family = "poisson".

  • ap(#) - defines the number of ancillary parameters. Used with family="user".

from

this is an optional argument giving the initial values for the full parameter vector, for more details on how to specify the initial estimates see the vignette.

family

a vector of strings specifying the family for each outcome specified in model. The currently available models include,

  • gaussian - Gaussian distribution

  • bernoulli - Bernoulli distribution

  • poisson - Poisson distribution

  • beta - Beta distribution

  • negbinomial - Negative binomial distribution

with survival models including,

  • exponential - exponential distribution

  • weibull - Weibull distribution

  • gompertz - Gompertz distribution

  • rp - Royston-Parmar model (complex predictor on the log cumulative hazard scale)

  • loghazard - general log hazard model (complex predictor on the log hazard scale)

and user-defined,

  • user - fit a user-defined model, which should be written using merlin's utility functions. The name of your user-defined function should be passed through the userf option.

  • null - is a convenience tool to define additional complex predictors, that do not contribute to the log likelihood. For use with family = "user".

link

string vector defining the link functions for each model. Default is "identity" for all models. If specified, you must define a link function for all submodels. Options include "identity", "log" and "logit".

timevar

specifies the variable which represents time, this is necessary when a function of time is used in the linear predictor of a survival model as it may interact with other elements of the model.

covariance

the structure of the variance-covariance matrix can be varied, the default is diagonal where all diagonal elements of the variance-covariance matrix are estimated uniquely, identity assumes all diagonal elements are equal and unstructured estimates all elements of the variance-covariance matrix.

data

a data frame containing all variables required for fitting the model. Can be a tibble object.

userf

string vector defining the name of the user-written functions for each family="user". Each function must be in memory and should return the observation level contribution to the log-likelihood.

sweights

Not documented.

levels

if the model contains random-effects then a vector giving the order of levels must be specified, from the highest level to the lowest, e.g. levels=c("practice","id").

predict

Not documented.

predtype

Not documented.

predmodel

Not documented.

causes

Not documented.

at

Not documented.

contrast

Not documented.

modelfit

Not documented.

control

A list of parameters that control the estimation algorithm. Generally it should not be modified, unless there are convergence issues. Possible values are:

  • ip An optional vector of integers specifying the number of integration points to be used when integrating out the random effects. A different number of ip can be specified for each level (from highest to lowest level). If only a single number is given then merlin will assume that number of integration points at all levels. Default is ip = 7 for each level using Gauss-Hermite quadrature, or ip = 100 for each level using Monte-Carlo integration;

  • intmethod The method used for numerically integrating out the random-effects in order to calculate the likelihood for a mixed effects model which includes random effects. Options include ghermite for non-adaptive Gauss-Hermite quadrature, halton for Monte-Carlo integration using Halton sequences, sobol for Monte-Carlo integration using Sobol sequences, or mc for standard Monte-Carlo integration using normal draws. The default is ghermite. Level-specific integration techniques can be specified, for example, with a three level model, we may use Gauss-Hermite quadrature at the highest level, and Monte-Carlo integration with Halton sequences at level 2, using intmethod = c("ghermite","halton").

  • debug Not documented.

  • verbose Not documented.

  • optim.method The optim method to be used. Defaults to Nelder-Mead, see optim for available methods.

  • maxit The maximum number of iterations for the optimisation routine. Defaults to 5000.

Author(s)

Emma C. Martin, Alessandro Gasparini and Michael J. Crowther

References

Crowther MJ. Extended multivariate generalised linear and non-linear mixed effects models. https://arxiv.org/abs/1710.02223

Crowther MJ. merlin - a unified framework for data analysis and methods development in Stata. https://arxiv.org/abs/1806.01615

Martin EC, Gasparini A, Crowther MJ. merlin - an R package for mixed effects regression of linear, non-linear and user-defined models.

See Also

predict.merlin

merlin_util_depvar, merlin_util_timevar, merlin_util_xzb, merlin_util_xzb_mod merlin_util_xzb_deriv, merlin_util_xzb_deriv_mod merlin_util_xzb_deriv2, merlin_util_xzb_deriv2_mod merlin_util_xzb_integ, merlin_util_xzb_integ_mod merlin_util_ev, merlin_util_ev_mod merlin_util_ev_deriv, merlin_util_ev_deriv_mod merlin_util_ev_deriv2, merlin_util_ev_deriv2_mod merlin_util_ev_integ, merlin_util_ev_integ_mod merlin_util_ap, merlin_util_ap_mod

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
## Not run: 
library(merlin)
data(pbc.merlin, package = "merlin")

# Linear fixed-effects model
merlin(logb ~ year,
       family = "gaussian",
       data = pbc.merlin)

# Linear mixed-effects model with random intercept and slope at ID level
merlin(logb ~ year + M1[id] * 1 + year:M2[id] * 1,
       family = "gaussian",
       levels = "id",
       data = pbc.merlin)

# Joint longitudinal and survival model with shared random effects
merlin(model = list(logb ~ year + M1[id] * 1,
                    Surv(stime, died) ~ trt + M1[id]),
       family = c("gaussian", "weibull"),
       levels = "id",
       data = pbc.merlin)

# Joint longitudinal and survival model with expected value
merlin(model = list(logb ~ year + M1[id] * 1,
                    Surv(stime, died) ~ trt + EV[logb]),
       family = c("gaussian", "weibull"),
       levels = "id",
       timevar = c("year","stime"),
       data = pbc.merlin)

# Gaussian distribution - implemented as a user-written family
logl_gaussian <- function(gml)
{
  y    <- merlin_util_depvar(gml)
  xzb  <- merlin_util_xzb(gml)
  se   <- exp(merlin_util_ap(gml,1))

  mu   <- (sweep(xzb,1,y,"-"))^2
  logl <- ((-0.5 * log(2*pi) - log(se)) - (mu/(2 * se^2)))
  return(logl)
}

merlin(logb ~ year + ap(1), family = "user", data = pbc.merlin,
                                  userf = "logl_gaussian")

# 3-level Weibull model
merlin(Surv(stime1,dead1) ~ age + M1[id1]*1 + M2[id2]*1,
       levels=c("id1","id2"), family="weibull", data=sim3)

## End(Not run)

merlin documentation built on July 8, 2020, 7:31 p.m.