Design: A GLMM Design

DesignR Documentation

A GLMM Design

Description

A GLMM Design

A GLMM Design

Details

An R6 class representing a GLMM and study design

For the generalised linear mixed model

Y \sim F(μ,σ)

μ = h^-1(Xβ + Zγ)

γ \sim MVN(0,D)

where h is the link function. A Design in comprised of a MeanFunction object, which defines the family F, link function h, and fixed effects design matrix X, and a Covariance object, which defines Z and D. The class provides methods for analysis and simulation with these models.

This class provides methods for: data simulation (sim_data() and fitted()), model fitting using Markov Chain Monte Carlo Maximum Likelihood (MCML) methods (MCML()), design analysis via simulation including power (analysis()), deletion diagnostics (dfbeta()), and permutation tests including p-values and confidence intervals (permutation()).

The class by default calculates the covariance matrix of the observations as:

Σ = W^{-1} + ZDZ^T

where W is a diagonal matrix with the WLS iterated weights for each observation equal to, for individual i φ a_i v(μ_i)[h'(μ_i)]^2 (see Table 2.1 in McCullagh and Nelder (1989) ISBN:9780412317606). For very large designs, this can be disabled as the memory requirements can be prohibitive.

Calls the respective print methods of the linked covariance and mean function objects.

The matrices X and Z both have n rows, where n is the number of observations in the model/design.

Number of clusters Returns a data frame describing the number of independent clusters or groups at each level in the design. For example, if there were cluster-periods nested in clusters, then the top level would be clusters, and the second level would be cluster periods.

analysis The analysis function conducts a detailed design analysis using the analysis model specified by the object. Data are simulated either using the same data generating process, or using a different Design object specified by the user to allow for model misspecification. On each iteration the model is estimated with the simulated data y using either the MCML function or approximate parameters and standard errors using generalised least squares. MCML is an exact maximum likelihood algorithm, and can be slow, so results from previous simulations are saved in the design object and can be recalled later. Deletion diagnostics are also calculated to calculate influential parts of the design, although these are typically not useful for balanced experimental designs.

The function returns an glmmr.sim object, which estimates and summarises:

Model fitting and simulation diagnostics Convergence failure percent and coverage. Maximum likelihood estimators for GLMMs can fail to converge or reach the MLE. GLMMs can also be susceptible to small sample biases where the relevant sample size is at the level of clustering and correlation.

Error rates Type 2 (power), Type S (significance), and Type M (magnitude) errors are reported.

p-value and confidence interval width distributions

Deletion diagnostics For unbalanced designs and under model misspecifications, certain parts of the design may have more influence than others over the estimate of interest, or have a larger than desired effect. A summary of the DFBETA diagnostic is provided.

Approximate power Calculates the approximate power of the design using the square root of the relevant element of the GLS variance matrix:

(X^TΣ^{-1}X)^{-1}

Note that this is equivalent to using the "design effect" for many models.

MCMCML Fits generalised linear mixed models using one of three algorithms: Markov Chain Newton Raphson (MCNR), Markov Chain Expectation Maximisation (MCEM), or Maximum simulated likelihood (MSL). All the algorithms are described by McCullagh (1997). For each iteration of the algorithm the unobserved random effect terms (γ) are simulated using Markov Chain Monte Carlo (MCMC) methods (we use Hamiltonian Monte Carlo through Stan), and then these values are conditioned on in the subsequent steps to estimate the covariance parameters and the mean function parameters (β). For all the algorithms, the covariance parameter estimates are updated using an expectation maximisation step. For the mean function parameters you can either use a Newton Raphson step (MCNR) or an expectation maximisation step (MCEM). A simulated likelihood step can be added at the end of either MCNR or MCEM, which uses an importance sampling technique to refine the parameter estimates.

The accuracy of the algorithm depends on the user specified tolerance. For higher levels of tolerance, larger numbers of MCMC samples are likely need to sufficiently reduce Monte Carlo error.

The function also offers different methods of obtaining standard errors. First, one can generate estimates from the estimated Hessian matrix (se.method = 'lik'). Second, there are robust standard errors using a sandwich estimator based on White (1982) (se.method = 'robust'). Third, there are use approximate generalised least squares estimates based on the maximum likelihood estimates of the covariance parameters (se.method = 'approx'), or use a permutation test approach (se.method = 'perm'). Note that the permutation test can be accessed separately with the function permutation_test().

There are several options that can be specified to the function using the options argument. The options should be provided as a list, e.g. options = list(method="mcnr"). The possible options are:

  • b_se_only TRUE (calculate standard errors of the mean function parameters only) or FALSE (calculate all standard errors), default it FALSE.

  • use_cmdstanr TRUE (uses cmdstanr for the MCMC sampling, requires cmdstanr), or FALSE (uses rstan). Default is FALSE.

  • skip_cov_optim TRUE (skips the covariance parameter estimation step, and uses the values covariance$parameters), or FALSE (run the whole algorithm)], default is FALSE

  • sim_lik_step TRUE (conduct a simulated likelihood step at the end of the algorithm), or FALSE (does not do this step), defaults to FALSE.

  • no_warnings TRUE (do not report any warnings) or FALSE (report warnings), default to FALSE

  • perm_type Either cov (use weighted test statistic in permutation test) or unw (use unweighted test statistic), defaults to cov. See permutation_test().

  • perm_iter Number of iterations for the permutation test, default is 100.

  • perm_parallel TRUE (run permuation test in parallel) or FALSE (runs on a single thread), default to TRUE

  • warmup_iter Number of warmup iterations on each iteration for the MCMC sampler, default is 500

  • perm_ci_steps Number of steps for the confidence interval search procedure if using the permutation test, default is 1000. See permutation_test().

  • fd_tol The tolerance of the first difference method to estimate the Hessian and Gradient, default is 1e-4.

Permutation tests If the user provided a re-randomisation function to the linked mean function object (see MeanFunction), then a permuation test can be conducted. A new random assignment is generated on each iteration of the permutation test. The test statistic can be either a quasi-score statistic, weighting the observations using the covariance matrix (type="cov"), or an unweighted statistic that weights each observation in each cluster equally (type="unw"). The 1-alpha% confidence interval limits are estimated using an efficient iterative stochastic search procedure. On each step of the algorithm a single permuation and test statistic is generated, and the current estimate of the confidence interval limit either increased or decreased depedning on its value. The procedure converges in probability to the true limits, see Watson et al (2021) and Garthwaite (1996).

MCMC Draws samples from the posterior distribution of the model parameters using Stan. Priors are specified using the priors argument. Currently, only Gaussian (or half-Gaussian for covariance parameters) prior distributions are supported. The argument priors accepts a list with three or four elements, prior_b_mean, prior_b_sd which are vectors specifying the prior mean and standard deviation for the mean function parameters β in the model; prior_g_sd specifying the prior standard deviation for the half-Gaussian prior for the covariance parameters, and optionally sigma_sd for the half-Gaussian prior for the scale terms in models that have a scale parameter (Gaussian and Gamma currently). By default the function uses rstan, however the function can optionally call cmdstanr instead if it is installed, using the option use_cmdstanr=TRUE. For further details on the use and arguments that can be used, see sampling.

Bayesian design analysis The Bayesian design analysis conducts multiple iterations of simulating data from a GLMM and then fitting a GLMM to analyse the properties of the posterior distribution and model calibration to inform study design. Data are either simulated from the same model as the analysis model, i.e. there is correct model specification (nothing is specified for the option sim_design), or data are simulated from a different model (a Design object passed to the sim_design argument of this function) and then fit using the Design object from which this function was called, i.e. with model misspecification. The function analyses four related statistics summarising the design. First, the expected posterior variance; second, the variance of the posterior variance; third, the probability the parameter of interest will be greater than some threshold; and fourth, simulation-based calibration (see Talts et al, 2021).

Model fitting is done using Stan's sampler (see sampling and the MCMC() function in this class).

Public fields

covariance

A Covariance object defining the random effects covariance.

mean_function

A MeanFunction object, defining the mean function for the model, including the data and covariate design matrix X.

exp_condition

A vector indicting the unique experimental conditions for each observation, see Details.

Sigma

The overall covariance matrix for the observations. Calculated and updated automatically as W^{-1} + ZDZ^T where W is an n x n diagonal matrix with elements on the diagonal equal to the GLM iterated weights. See Details.

var_par

Scale parameter required for some distributions (Gaussian, Gamma, Beta).

Methods

Public methods


Method fitted()

Return predicted values based on the currently stored parameter values in mean_function

Usage
Design$fitted(type = "link")
Arguments
type

One of either "link" for values on the scale of the link function, or "response" for values on the scale of the response

Returns

A Matrix class object containing the predicted values


Method new()

Create a new Design object

Usage
Design$new(
  covariance,
  mean.function,
  var_par = NULL,
  verbose = TRUE,
  skip.sigma = FALSE
)
Arguments
covariance

Either a Covariance object, or an equivalent list of arguments that can be passed to Covariance to create a new object.

mean.function

Either a MeanFunction object, or an equivalent list of arguments that can be passed to MeanFunction to create a new object.

var_par

Scale parameter required for some distributions, including Gaussian. Default is NULL.

verbose

Logical indicating whether to provide detailed output

skip.sigma

Logical indicating whether to skip the creating of the covariance matrix Sigma. For very large designs with thousands of observations or more, the covariance matrix will be too big to fit in memory, so this option will prevent sigma being created.

Returns

A new Design class object

Examples
#create a data frame describing a cross-sectional parallel cluster
#randomised trial
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)

#alternatively we can pass the data directly to Design
#here we will specify a cohort study
df <- nelder(~ind(20) > t(6))
df$int <- 0
df[df$t > 3, 'int'] <- 1

des <- Design$new(
covariance = list(
  data=df,
  formula = ~ (1|ar1(t)*gr(ind)),
  parameters = c(0.8,1)),
mean.function = list(
  formula = ~int + factor(t),
  data=df,
  parameters = rep(0,7),
  family = poisson()))
                  
#an example of a spatial grid with two time points
df <- nelder(~ (x(10)*y(10))*t(2))
spt_design <- Design$new(covariance = list(data=df,
                                           formula = ~(1|fexp(x,y)*ar1(t)),
                                           parameters =c(0.2,0.1,0.8)),
                         mean.function = list(data=df,
                                              formula = ~ 1,
                                              parameters = c(0.5),
                                              family=poisson())) 

Method print()

Print method for Design class

Usage
Design$print()
Arguments
...

ignored


Method n()

Returns the number of observations in the model

Usage
Design$n(...)
Arguments
...

ignored


Method n_cluster()

Returns the number of clusters at each level

Usage
Design$n_cluster()
Arguments
...

ignored

Returns

A data frame with the level, number of clusters, and variables describing each level.

Examples
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
des$n_cluster() ## returns two levels of 10 and 50

Method analysis()

Run a design analysis of the model via simulation

Usage
Design$analysis(
  type,
  iter,
  par,
  alpha = 0.05,
  sim_design,
  parallel = TRUE,
  verbose = TRUE,
  options = list(),
  ...
)
Arguments
type

One of either sim_data (recalls saved data from a previous call to this function), sim (full simulation using MCML), or sim_approx ( uses GLS to approximate the MLE and standard errors)

iter

Integer. The number of iterations of the simulation to run

par

Integer. The parameter of interest for which design analysis statistics should be calculated. Refers to the column of X.

alpha

Numeric. The type I error rate.

sim_design

Optional. A different Design object that will be used to simulate data to allow for model misspecification.

parallel

Logical indicating whether to run the simulations in parallel

verbose

Logical indicating whether to report detailed output. Defaults to TRUE.

options

Optional. A named list to pass to the options argument of MCML

...

Additional arguments passed to MCML, see below.

Returns

A glmmr.sim object containing the estimates from all the simulations, including standard errors, deletion diagnostic statistics, and details about the simulation.

Examples
\dontrun{
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
# analysis using MCML mcem algorithm
test1 <- des$analysis(type="sim",
                     iter=100,
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     method = "mcem",
                     m = 100)
#an analysis using the permutation test option and MCNR
test2 <- des$analysis(type="sim",
                     iter=100,
                     se.method="perm",
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     options = list(
                       perm_type="unw", perm_iter=100,
                       perm_parallel=FALSE,perm_ci_steps=1000)
                     method = "mcnr",
                     m = 100)
#returning previously saved sim data
test3 <- des$analysis(type="sim_data")

#to test model misspecification we can simulate from a different model
cov2 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)*ar1(t)),
  parameters = c(0.25,0.8)
)
des2 <- Design$new(
  covariance = cov2,
  mean.function = mf1,
  var_par = 1
)    
 test4 <- des$analysis(type="sim",
                     iter=100,
                     sim_design = des2,
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     method = "mcem",
                     m = 100)                       
}

Method power()

Approximate power of the design using the GLS variance

Usage
Design$power(par, value, alpha = 0.05)
Arguments
par

Integer indicating which parameter of the design the power should be calculated for. Refers to the order of parameters and column of X

value

Numeric specifying the value of the parameter to calculate the power at

alpha

Numeric between zero and one indicating the type I error rate. Default of 0.05.

Returns

A value between zero and one indicating the approximate power of the design.

Examples
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
des$power(6,0.5) #power of 0.79

Method subset_rows()

Subsets the design keeping specified observations only

Given a vector of row indices, the corresponding rows will be kept and the other rows will be removed from the mean function and covariance

Usage
Design$subset_rows(index)
Arguments
index

Integer or vector integers listing the rows to keep

Returns

The function updates the object and nothing is returned

Examples
#generate a stepped wedge design and remove the first sequence
des <- stepped_wedge(8,10,icc=0.05)
ids_to_keep <- which(des$mean_function$data$J!=1)
des$subset_rows(ids_to_keep)

Method subset_cols()

Subsets the columns of the design

Removes the specified columns from the linked mean function object's X matrix.

Usage
Design$subset_cols(index)
Arguments
index

Integer or vector of integers specifying the indexes of the columns to keep

Returns

The function updates the object and nothing is returned

Examples
#generate a stepped wedge design and remove first and last time periods
des <- stepped_wedge(8,10,icc=0.05)
des$subset_cols(c(2:8,10))

Method plot()

Generates a plot of the design

Generates a 'bubble' plot of the design with respect to two or three variables in which the size of the points at each location are scaled by the number of observations at that location. For example, for a cluster randomised trial the user might specify time period on the x-axis and cluster ID on the y-axis. For a geospatial sampling design the x and y axes might represent spatial dimensions.

Usage
Design$plot(x, y, z = NULL, treat)
Arguments
x

String naming a column in the data frame in the linked covariance object (self$covariance$data) for the x-axis

y

String naming a column in the data frame in the linked covariance object (self$covariance$data) for the y-axis

z

Optional. String naming a column in the data frame in the linked covariance object (self$covariance$data) for a 'third axis' used for faceting

treat

String naming a column in the data frame in the linked mean function object (self$mean_function$data) that identifies the treatment status of the observations at each location, used to set the colour of the points in the plot

Returns

A ggplot2 plot

Examples
\dontrun{
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
des$plot(x="cl",y="t",treat="int")
}

Method sim_data()

Generates a realisation of the design

Generates a single vector of outcome data based upon the specified GLMM design

Usage
Design$sim_data(type = "y")
Arguments
type

Either 'y' to return just the outcome data, or 'data' to return a data frame with the simulated outcome data alongside the model data

Returns

Either a vector or a data frame

Examples
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()

Method check()

Checks for any changes in linked objects and updates.

Checks for any changes in any object and updates all linked objects if any are detected. Generally called automatically.

Usage
Design$check(verbose = TRUE)
Arguments
verbose

Logical indicating whether to report if any updates are made, defaults to TRUE

Returns

Linked objects are updated by nothing is returned

Examples
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
des$check() #does nothing
des$covariance$parameters <- c(0.1,0.9)
des$check() #updates 
des$mean_function$parameters <- c(rnorm(5),0.1)
des$check() #updates 

Method MCML()

Markov Chain Monte Carlo Maximum Likelihood model fitting

Usage
Design$MCML(
  y,
  start,
  se.method = "lik",
  method = "mcnr",
  permutation.par,
  verbose = TRUE,
  tol = 0.01,
  m = 100,
  max.iter = 30,
  options = list()
)
Arguments
y

A numeric vector of outcome data

start

Optional. A numeric vector indicating starting values for the MCML algorithm iterations. If this is not specified then the parameter values stored in the linked mean function object will be used.

se.method

One of either 'lik', 'approx', 'perm', or 'none', see Details.

method

The MCML algorithm to use, either mcem or mcnr, see Details. Default is mcem.

permutation.par

Optional. Integer specifing the index of the parameter if permutation tests are being used.

verbose

Logical indicating whether to provide detailed output, defaults to TRUE.

tol

Numeric value, tolerance of the MCML algorithm, the maximum difference in parameter estimates between iterations at which to stop the algorithm.

m

Integer. The number of MCMC draws of the random effects on each iteration.

max.iter

Integer. The maximum number of iterations of the MCML algorithm.

options

An optional list providing options to the algorithm, see Details.

Returns

A mcml object

Examples
\dontrun{
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
# fits the models using MCEM but does not estimate standard errors
fit1 <- des$MCML(y = ysim,
  se.method = "none")
#fits the models using MCNR but does not estimate standard errors
fit2 <- des$MCML(y = ysim,
  se.method = "none",
  method="mcnr")
#fits the models and uses permutation tests for parameter of interest
fit3 <- des$MCML(y = ysim,
  se.method = "perm",
  permutation.par = 6,
  options = list(
    perm_type="unw",
    perm_iter=1000,
    perm_parallel=FALSE,
    perm_ci_steps=1000
  ))
 #adds a simulated likelihood step after the MCEM algorithm
fit4 <- des$MCML(y = des$sim_data(),
  se.method = "none",
  options = list(
    sim_lik_step=TRUE
  ))  
}

Method dfbeta()

Calculates DFBETA deletion diagnostic values

Calculates the DFBETA deletion diagnostic statistic for each observation for the specified parameter.

Usage
Design$dfbeta(y, par)
Arguments
y

Numeric vector of outcomes data

par

Integer indicating which parameter, as a column of X, to report the deletion diagnostics for.

Returns

A vector of length self$n() with the value of DFBETA for each observation for the specified parameter

Examples
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
des$dfbeta(ysim,6)

Method permutation_test()

Conducts a permuation test

Estimates p-values and confidence intervals using a permutation test

Usage
Design$permutation_test(
  y,
  permutation.par,
  start,
  iter = 1000,
  nsteps = 1000,
  type = "cov",
  parallel = TRUE,
  verbose = TRUE
)
Arguments
y

Numeric vector of outcome data

permutation.par

Integer indicator which parameter to conduct a permutation test for. Refers to a column of the X matrix.

start

Value of the parameter. Used both as a starting value for the algorithms and as a best estimate for the confidence interval search procedure.

iter

Integer. Number of iterations of the permuation test to conduct

nsteps

Integer. Number of steps of the confidence interval search procedure

type

Either cov for a test statistic weighted by the covariance matrix, or unw for an unweighted test statistic. See Details.

parallel

Logical indicating whether to run the tests in parallel

verbose

Logical indicating whether to report detailed output

Returns

A list with the estimated p-value and the estimated lower and upper 95% confidence interval

Examples
\dontrun{
 df <- nelder(~(cl(6)*t(5)) > ind(5))
 df$int <- 0
 df[df$cl > 3, 'int'] <- 1
 #generate function that produces random allocations
 treatf <- function(){
             tr <- sample(rep(c(0,1),each=3),6,replace = FALSE)
             rep(tr,each=25)
             }
 mf1 <- MeanFunction$new(
   formula = ~ factor(t) + int - 1,
   data=df,
   parameters = c(rep(0,5),0.6),
   family =gaussian(),
   treat_var = "int",
   random_function = treatf)
 cov1 <- Covariance$new(
   data = df,
   formula = ~ (1|gr(cl)),
   parameters = c(0.25))
 des <- Design$new(
   covariance = cov1,
   mean.function = mf1,
   var_par = 1)
 #run MCML to get parameter estimate:
 fit1 <- des$MCML(y = ysim,
  se.method = "none")
 perm1 <- des$permutation_test(
   y=ysim,
   permutation.par=6,
   start = fit1$coefficients$est[6],
   type="unw",
   iter = 1000,
   nsteps = 1000) 
}

Method MCMC()

Fit the GLMM using MCMC

Fit the GLMM using MCMC. The function calls a Stan program to draw posterior samples.

Usage
Design$MCMC(
  y,
  priors,
  warmup_iter = 1000,
  sampling_iter = 1000,
  chains = 4,
  use_cmdstanr = FALSE,
  parallel = TRUE,
  ...
)
Arguments
y

Numeric vector providing the outcome data

priors

A named list specifying the prior mean and standard deviation for the Gaussian prior distributions, see Details.

warmup_iter

A positive integer specifying the number of warmup iterations for each MCMC chain

sampling_iter

A positive integer specifying the number of sampling iterations for each MCMC chain

chains

A positive integer specifying the number of MCMC chains to run

use_cmdstanr

Logical indicating whether to use cmdstanr, the default is FALSE, which will use rstan

parallel

Logical indicating whether to run the chains in parallel

...

Additional arguments to pass to sampling.

Returns

Either an S4 stanfit object returned by sampling, or a cmdstanr environment, depending on the sampler used.

Examples
\dontrun{
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
# fits the model with default priors 
fit1 <- des$MCMC(y=ysim)
#specify priors
fit2 <- des$MCMC(y=ysim,
  priors = list(
    prior_b_mean = rep(0,6),
    prior_b_sd = c(rep(3,5),1),
    prior_g_sd = rep(1,2)))
}

Method analysis_bayesian()

Bayesian design analysis

Runs a Bayesian design analysis using simulated data

Usage
Design$analysis_bayesian(
  iter,
  par,
  priors,
  threshold = 0,
  sim_design,
  priors_sim,
  warmup_iter = 200,
  sampling_iter = 200,
  chains = 1,
  parallel = TRUE,
  verbose = TRUE,
  use_cmdstanr = FALSE,
  ...
)
Arguments
iter

A positive integer specifying the number of simulation iterations to run

par

A positive interger specifying the index of the parameter in the mean function to analyse, refers specifically to the column of X

priors

A named list specifying the priors, see Details in the MCMC() function in this class

threshold

A number specifying the threshold. The probability that the parameter of interest is greater than this threshold will be calculated.

sim_design

Optional. A different Design object that should be used to simulate the data for the simulation.

priors_sim

Optional. A named list of the same structure as priors the specifies the priors from which to simulate data, if they are different to the priors for model fitting.

warmup_iter

A positive integer specifying the number of warmup iterations for each MCMC chain when fitting the model on each simulation iteration.

sampling_iter

A positive integer specifying the number of sampling iterations for each MCMC chain when fitting the model on each simulation iteration.

chains

A positive integer specifying the number of MCMC chains when fitting the model on each simulation iteration.

parallel

Logical indicating whether to run the MCMC chains in parallel

verbose

Logical indicating whether to provide detailed output of the progress of simulations.

use_cmdstanr

Logical indicating whether to use cmdstanr instead of rstan, the default is FALSE (i.e. use rstan)

...

Additional options to pass to rstan::sampling or cmdstanr::sample

Returns

A glmmr.sim object containing the samples of posterior variance and relevant probabilities to summarise the analysis

Examples
\dontrun{
#' df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
test <- des$analysis_bayesian(iter=10,
  par=6,
  warmup_iter = 200,
  sampling_iter = 500,
  parallel = FALSE,
  verbose=FALSE)
#to examine misspecification we can sample from a different model 
des2 <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)),
    parameters = c(0.25)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
test2 <- des$analysis_bayesian(iter=10,
  par=6,
  sim_design=des2,
  warmup_iter = 200,
  sampling_iter = 500,
  parallel = FALSE,
  verbose=FALSE)
}

Method clone()

The objects of this class are cloneable with this method.

Usage
Design$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Braun and Feng McCullagh Stan McCullagh and Nelder Approx GLMMs paper Watson confidence interval

Watson et al. Arxiv Braun and Feng Gail Garthwaite

SBC paper (Talts) APV paper

See Also

nelder, MeanFunction, Covariance

print.glmmr.sim, MCML

sampling, stan

Examples


## ------------------------------------------------
## Method `Design$new`
## ------------------------------------------------

#create a data frame describing a cross-sectional parallel cluster
#randomised trial
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)

#alternatively we can pass the data directly to Design
#here we will specify a cohort study
df <- nelder(~ind(20) > t(6))
df$int <- 0
df[df$t > 3, 'int'] <- 1

des <- Design$new(
covariance = list(
  data=df,
  formula = ~ (1|ar1(t)*gr(ind)),
  parameters = c(0.8,1)),
mean.function = list(
  formula = ~int + factor(t),
  data=df,
  parameters = rep(0,7),
  family = poisson()))
                  
#an example of a spatial grid with two time points
df <- nelder(~ (x(10)*y(10))*t(2))
spt_design <- Design$new(covariance = list(data=df,
                                           formula = ~(1|fexp(x,y)*ar1(t)),
                                           parameters =c(0.2,0.1,0.8)),
                         mean.function = list(data=df,
                                              formula = ~ 1,
                                              parameters = c(0.5),
                                              family=poisson())) 

## ------------------------------------------------
## Method `Design$n_cluster`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
des$n_cluster() ## returns two levels of 10 and 50

## ------------------------------------------------
## Method `Design$analysis`
## ------------------------------------------------

## Not run: 
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
# analysis using MCML mcem algorithm
test1 <- des$analysis(type="sim",
                     iter=100,
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     method = "mcem",
                     m = 100)
#an analysis using the permutation test option and MCNR
test2 <- des$analysis(type="sim",
                     iter=100,
                     se.method="perm",
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     options = list(
                       perm_type="unw", perm_iter=100,
                       perm_parallel=FALSE,perm_ci_steps=1000)
                     method = "mcnr",
                     m = 100)
#returning previously saved sim data
test3 <- des$analysis(type="sim_data")

#to test model misspecification we can simulate from a different model
cov2 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)*ar1(t)),
  parameters = c(0.25,0.8)
)
des2 <- Design$new(
  covariance = cov2,
  mean.function = mf1,
  var_par = 1
)    
 test4 <- des$analysis(type="sim",
                     iter=100,
                     sim_design = des2,
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     method = "mcem",
                     m = 100)                       

## End(Not run)

## ------------------------------------------------
## Method `Design$power`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
des$power(6,0.5) #power of 0.79

## ------------------------------------------------
## Method `Design$subset_rows`
## ------------------------------------------------

#generate a stepped wedge design and remove the first sequence
des <- stepped_wedge(8,10,icc=0.05)
ids_to_keep <- which(des$mean_function$data$J!=1)
des$subset_rows(ids_to_keep)

## ------------------------------------------------
## Method `Design$subset_cols`
## ------------------------------------------------

#generate a stepped wedge design and remove first and last time periods
des <- stepped_wedge(8,10,icc=0.05)
des$subset_cols(c(2:8,10))

## ------------------------------------------------
## Method `Design$plot`
## ------------------------------------------------

## Not run: 
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
des$plot(x="cl",y="t",treat="int")

## End(Not run)

## ------------------------------------------------
## Method `Design$sim_data`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()

## ------------------------------------------------
## Method `Design$check`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
des$check() #does nothing
des$covariance$parameters <- c(0.1,0.9)
des$check() #updates 
des$mean_function$parameters <- c(rnorm(5),0.1)
des$check() #updates 

## ------------------------------------------------
## Method `Design$MCML`
## ------------------------------------------------

## Not run: 
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
# fits the models using MCEM but does not estimate standard errors
fit1 <- des$MCML(y = ysim,
  se.method = "none")
#fits the models using MCNR but does not estimate standard errors
fit2 <- des$MCML(y = ysim,
  se.method = "none",
  method="mcnr")
#fits the models and uses permutation tests for parameter of interest
fit3 <- des$MCML(y = ysim,
  se.method = "perm",
  permutation.par = 6,
  options = list(
    perm_type="unw",
    perm_iter=1000,
    perm_parallel=FALSE,
    perm_ci_steps=1000
  ))
 #adds a simulated likelihood step after the MCEM algorithm
fit4 <- des$MCML(y = des$sim_data(),
  se.method = "none",
  options = list(
    sim_lik_step=TRUE
  ))  

## End(Not run)

## ------------------------------------------------
## Method `Design$dfbeta`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
des$dfbeta(ysim,6)

## ------------------------------------------------
## Method `Design$permutation_test`
## ------------------------------------------------

## Not run: 
 df <- nelder(~(cl(6)*t(5)) > ind(5))
 df$int <- 0
 df[df$cl > 3, 'int'] <- 1
 #generate function that produces random allocations
 treatf <- function(){
             tr <- sample(rep(c(0,1),each=3),6,replace = FALSE)
             rep(tr,each=25)
             }
 mf1 <- MeanFunction$new(
   formula = ~ factor(t) + int - 1,
   data=df,
   parameters = c(rep(0,5),0.6),
   family =gaussian(),
   treat_var = "int",
   random_function = treatf)
 cov1 <- Covariance$new(
   data = df,
   formula = ~ (1|gr(cl)),
   parameters = c(0.25))
 des <- Design$new(
   covariance = cov1,
   mean.function = mf1,
   var_par = 1)
 #run MCML to get parameter estimate:
 fit1 <- des$MCML(y = ysim,
  se.method = "none")
 perm1 <- des$permutation_test(
   y=ysim,
   permutation.par=6,
   start = fit1$coefficients$est[6],
   type="unw",
   iter = 1000,
   nsteps = 1000) 

## End(Not run)

## ------------------------------------------------
## Method `Design$MCMC`
## ------------------------------------------------

## Not run: 
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
# fits the model with default priors 
fit1 <- des$MCMC(y=ysim)
#specify priors
fit2 <- des$MCMC(y=ysim,
  priors = list(
    prior_b_mean = rep(0,6),
    prior_b_sd = c(rep(3,5),1),
    prior_g_sd = rep(1,2)))

## End(Not run)

## ------------------------------------------------
## Method `Design$analysis_bayesian`
## ------------------------------------------------

## Not run: 
#' df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
test <- des$analysis_bayesian(iter=10,
  par=6,
  warmup_iter = 200,
  sampling_iter = 500,
  parallel = FALSE,
  verbose=FALSE)
#to examine misspecification we can sample from a different model 
des2 <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)),
    parameters = c(0.25)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
test2 <- des$analysis_bayesian(iter=10,
  par=6,
  sim_design=des2,
  warmup_iter = 200,
  sampling_iter = 500,
  parallel = FALSE,
  verbose=FALSE)

## End(Not run)

samuel-watson/glmmr documentation built on July 27, 2022, 10:30 p.m.