merlin: merlin - Mixed Effects Regression for Linear, Nonlinear and...
In merlin: Mixed Effects Regression for Linear, Non-Linear and User-Defined Models

Description Usage Arguments Author(s) References See Also Examples

merlin fits linear, non-linear and user-defined mixed effects regression models. merlin can fit multivariate outcome models of any type, each of which could be repeatedly measured (longitudinal), with any number of levels, and with any number of random effects at each level. Standard distributions/models available include the Bernoulli, Gaussian, Poisson, beta, negative-binomial, and time-to-event/survival models include the exponential, Gompertz, Weibull, Royston-Parmar, and general hazard model. merlin provides a flexible predictor syntax, allowing the user to define variables, random effects, spline and fractional polynomial functions, functions of other outcome models, and any interaction between each of them. Non-linear and time-dependent effects are seamlessly incorporated into the predictor. merlin allows multivariate normal random effects, which are integrated out using Gaussian quadrature or Monte-Carlo integration. Relative survival (excess hazard) models are supported. Utility functions are provided to allow user-defined models to be specified, in conjunction with the complex predictor.

merlin(
  model,
  from = NULL,
  family = "gaussian",
  link = NULL,
  timevar = NULL,
  covariance = "diagonal",
  data,
  userf = NULL,
  sweights = NULL,
  levels = NULL,
  predict = FALSE,
  predtype = NULL,
  predmodel = NULL,
  causes = NULL,
  at = NULL,
  contrast = NULL,
  modelfit = NULL,
  control = list()
)

`model`	specify the fixed and random elements for each model outcome. Where there are multiple outcomes, the models should be specified in a list. Each model should be specified as a formula (e.g. `y~x`). A number of different element types can be specified, including varname - an independent variable from the data set random-effects - a random-effect at the cluster level can be specified using `M#[cluster level]`, for example `M1[id]` would define a random intercept at the ID level. Each independent random-effect should be given a unique name, if two random-effects are given the same name they will be treated as shared random-effects. `rcs()` - restricted cubic spline terms, this option can be used to include a restricted cubic spline function, with the degrees of freedom (number of spline terms) specified using the `df` sub-option, with the boundary knots assumed to be at the minimum and maximum of the variable, with internal knots placed at equally spaced centiles. Other default options `orthog = TRUE`, which by default orthogonalises the splines, `log = FALSE`, which can be used to calculate splines of the log of the variable and `event = F`, which can be used to calculate the internal knots based only on observations that experienced the event of interest (for survival models). `srcs()` is a shorthand element, equivalent to `rcs(..., log = TRUE, event = TRUE)`, for use with survival models. `fp()` - fractional polynomials of order 1 or 2 can be specified, the sub-option `powers` is used to specify the powers of the `fp` model. `EV[depvar]` - the expected value of the response of a submodel. `dEV[depvar]` - the first derivative with respect to time of the expected value of the response of a submodel. `d2EV[depvar]` - the second derivative with respect to time of the expected value of the response of a submodel. `iEV[depvar]` - the integral with respect to time of the expected value of the response of a submodel. `XB[depvar]` - the expected value of the complex predictor of a submodel. `dXB[depvar]` - the first derivative with respect to time of the expected value of the complex predictor of a submodel. `d2XB[depvar]` - the second derivative with respect to time of the expected value of the complex predictor of a submodel. `iXB[depvar]` - the integral with respect to time of the expected value of the complex predictor of a submodel. `bhazard(varname)` - invokes a relative survival (excess hazard) model. `varname` specifies the expected hazard rate at the event time. `exposure(varname)` - include `log(varname)` in the linear predictor, with a coefficient of 1. For use with `family = "poisson"`. `ap(#)` - defines the number of ancillary parameters. Used with `family="user"`.
`from`	this is an optional argument giving the initial values for the full parameter vector, for more details on how to specify the initial estimates see the vignette.
`family`	a vector of strings specifying the family for each outcome specified in model. The currently available models include, `gaussian` - Gaussian distribution `bernoulli` - Bernoulli distribution `poisson` - Poisson distribution `beta` - Beta distribution `negbinomial` - Negative binomial distribution with survival models including, `exponential` - exponential distribution `weibull` - Weibull distribution `gompertz` - Gompertz distribution `rp` - Royston-Parmar model (complex predictor on the log cumulative hazard scale) `loghazard` - general log hazard model (complex predictor on the log hazard scale) and user-defined, `user` - fit a user-defined model, which should be written using `merlin`'s utility functions. The name of your user-defined function should be passed through the `userf` option. `null` - is a convenience tool to define additional complex predictors, that do not contribute to the log likelihood. For use with `family = "user"`.
`link`	string vector defining the link functions for each model. Default is `"identity"` for all models. If specified, you must define a link function for all submodels. Options include `"identity"`, `"log"` and `"logit"`.
`timevar`	specifies the variable which represents time, this is necessary when a function of time is used in the linear predictor of a survival model as it may interact with other elements of the model.
`covariance`	the structure of the variance-covariance matrix can be varied, the default is `diagonal` where all diagonal elements of the variance-covariance matrix are estimated uniquely, `identity` assumes all diagonal elements are equal and `unstructured` estimates all elements of the variance-covariance matrix.
`data`	a data frame containing all variables required for fitting the model. Can be a `tibble` object.
`userf`	string vector defining the name of the user-written functions for each `family="user"`. Each function must be in memory and should return the observation level contribution to the log-likelihood.
`sweights`	Not documented.
`levels`	if the model contains random-effects then a vector giving the order of levels must be specified, from the highest level to the lowest, e.g. `levels=c("practice","id")`.
`predict`	Not documented.
`predtype`	Not documented.
`predmodel`	Not documented.
`causes`	Not documented.
`at`	Not documented.
`contrast`	Not documented.
`modelfit`	Not documented.
`control`	A list of parameters that control the estimation algorithm. Generally it should not be modified, unless there are convergence issues. Possible values are: `ip` An optional vector of integers specifying the number of integration points to be used when integrating out the random effects. A different number of `ip` can be specified for each `level` (from highest to lowest level). If only a single number is given then `merlin` will assume that number of integration points at all levels. Default is `ip = 7` for each level using Gauss-Hermite quadrature, or `ip = 100` for each level using Monte-Carlo integration; `intmethod` The method used for numerically integrating out the random-effects in order to calculate the likelihood for a mixed effects model which includes random effects. Options include `ghermite` for non-adaptive Gauss-Hermite quadrature, `halton` for Monte-Carlo integration using Halton sequences, `sobol` for Monte-Carlo integration using Sobol sequences, or `mc` for standard Monte-Carlo integration using normal draws. The default is `ghermite`. Level-specific integration techniques can be specified, for example, with a three level model, we may use Gauss-Hermite quadrature at the highest level, and Monte-Carlo integration with Halton sequences at level 2, using `intmethod = c("ghermite","halton")`. `debug` Not documented. `verbose` Not documented. `optim.method` The `optim` method to be used. Defaults to Nelder-Mead, see optim for available methods. `maxit` The maximum number of iterations for the optimisation routine. Defaults to 5000.

Emma C. Martin, Alessandro Gasparini and Michael J. Crowther

Crowther MJ. Extended multivariate generalised linear and non-linear mixed effects models. https://arxiv.org/abs/1710.02223

Crowther MJ. merlin - a unified framework for data analysis and methods development in Stata. https://arxiv.org/abs/1806.01615

Martin EC, Gasparini A, Crowther MJ. merlin - an R package for mixed effects regression of linear, non-linear and user-defined models.

predict.merlin

merlin_util_depvar, merlin_util_timevar, merlin_util_xzb, merlin_util_xzb_mod merlin_util_xzb_deriv, merlin_util_xzb_deriv_mod merlin_util_xzb_deriv2, merlin_util_xzb_deriv2_mod merlin_util_xzb_integ, merlin_util_xzb_integ_mod merlin_util_ev, merlin_util_ev_mod merlin_util_ev_deriv, merlin_util_ev_deriv_mod merlin_util_ev_deriv2, merlin_util_ev_deriv2_mod merlin_util_ev_integ, merlin_util_ev_integ_mod merlin_util_ap, merlin_util_ap_mod

## Not run: 
library(merlin)
data(pbc.merlin, package = "merlin")

# Linear fixed-effects model
merlin(logb ~ year,
       family = "gaussian",
       data = pbc.merlin)

# Linear mixed-effects model with random intercept and slope at ID level
merlin(logb ~ year + M1[id] * 1 + year:M2[id] * 1,
       family = "gaussian",
       levels = "id",
       data = pbc.merlin)

# Joint longitudinal and survival model with shared random effects
merlin(model = list(logb ~ year + M1[id] * 1,
                    Surv(stime, died) ~ trt + M1[id]),
       family = c("gaussian", "weibull"),
       levels = "id",
       data = pbc.merlin)

# Joint longitudinal and survival model with expected value
merlin(model = list(logb ~ year + M1[id] * 1,
                    Surv(stime, died) ~ trt + EV[logb]),
       family = c("gaussian", "weibull"),
       levels = "id",
       timevar = c("year","stime"),
       data = pbc.merlin)

# Gaussian distribution - implemented as a user-written family
logl_gaussian <- function(gml)
{
  y    <- merlin_util_depvar(gml)
  xzb  <- merlin_util_xzb(gml)
  se   <- exp(merlin_util_ap(gml,1))

  mu   <- (sweep(xzb,1,y,"-"))^2
  logl <- ((-0.5 * log(2*pi) - log(se)) - (mu/(2 * se^2)))
  return(logl)
}

merlin(logb ~ year + ap(1), family = "user", data = pbc.merlin,
                                  userf = "logl_gaussian")

# 3-level Weibull model
merlin(Surv(stime1,dead1) ~ age + M1[id1]*1 + M2[id2]*1,
       levels=c("id1","id2"), family="weibull", data=sim3)

## End(Not run)