Design: A GLMM Design
In samuel-watson/glmmr: Design and analysis for generalised linear mixed models in R

Design

R Documentation

A GLMM Design

Description

A GLMM Design

Details

An R6 class representing a GLMM and study design

For the generalised linear mixed model

Y \sim F(μ,σ)

μ = h^-1(Xβ + Zγ)

γ \sim MVN(0,D)

where h is the link function. A Design in comprised of a MeanFunction object, which defines the family F, link function h, and fixed effects design matrix X, and a Covariance object, which defines Z and D. The class provides methods for analysis and simulation with these models.

This class provides methods for: data simulation (sim_data() and fitted()), model fitting using Markov Chain Monte Carlo Maximum Likelihood (MCML) methods (MCML()), design analysis via simulation including power (analysis()), deletion diagnostics (dfbeta()), and permutation tests including p-values and confidence intervals (permutation()).

The class by default calculates the covariance matrix of the observations as:

Σ = W^{-1} + ZDZ^T

where W is a diagonal matrix with the WLS iterated weights for each observation equal to, for individual i φ a_i v(μ_i)[h'(μ_i)]^2 (see Table 2.1 in McCullagh and Nelder (1989) ISBN:9780412317606). For very large designs, this can be disabled as the memory requirements can be prohibitive.

Calls the respective print methods of the linked covariance and mean function objects.

The matrices X and Z both have n rows, where n is the number of observations in the model/design.

Number of clusters Returns a data frame describing the number of independent clusters or groups at each level in the design. For example, if there were cluster-periods nested in clusters, then the top level would be clusters, and the second level would be cluster periods.

analysis The analysis function conducts a detailed design analysis using the analysis model specified by the object. Data are simulated either using the same data generating process, or using a different Design object specified by the user to allow for model misspecification. On each iteration the model is estimated with the simulated data y using either the MCML function or approximate parameters and standard errors using generalised least squares. MCML is an exact maximum likelihood algorithm, and can be slow, so results from previous simulations are saved in the design object and can be recalled later. Deletion diagnostics are also calculated to calculate influential parts of the design, although these are typically not useful for balanced experimental designs.

The function returns an glmmr.sim object, which estimates and summarises:

Model fitting and simulation diagnostics Convergence failure percent and coverage. Maximum likelihood estimators for GLMMs can fail to converge or reach the MLE. GLMMs can also be susceptible to small sample biases where the relevant sample size is at the level of clustering and correlation.

Error rates Type 2 (power), Type S (significance), and Type M (magnitude) errors are reported.

p-value and confidence interval width distributions

Deletion diagnostics For unbalanced designs and under model misspecifications, certain parts of the design may have more influence than others over the estimate of interest, or have a larger than desired effect. A summary of the DFBETA diagnostic is provided.

Approximate power Calculates the approximate power of the design using the square root of the relevant element of the GLS variance matrix:

(X^TΣ^{-1}X)^{-1}

Note that this is equivalent to using the "design effect" for many models.

MCMCML Fits generalised linear mixed models using one of three algorithms: Markov Chain Newton Raphson (MCNR), Markov Chain Expectation Maximisation (MCEM), or Maximum simulated likelihood (MSL). All the algorithms are described by McCullagh (1997). For each iteration of the algorithm the unobserved random effect terms (γ) are simulated using Markov Chain Monte Carlo (MCMC) methods (we use Hamiltonian Monte Carlo through Stan), and then these values are conditioned on in the subsequent steps to estimate the covariance parameters and the mean function parameters (β). For all the algorithms, the covariance parameter estimates are updated using an expectation maximisation step. For the mean function parameters you can either use a Newton Raphson step (MCNR) or an expectation maximisation step (MCEM). A simulated likelihood step can be added at the end of either MCNR or MCEM, which uses an importance sampling technique to refine the parameter estimates.

The accuracy of the algorithm depends on the user specified tolerance. For higher levels of tolerance, larger numbers of MCMC samples are likely need to sufficiently reduce Monte Carlo error.

The function also offers different methods of obtaining standard errors. First, one can generate estimates from the estimated Hessian matrix (se.method = 'lik'). Second, there are robust standard errors using a sandwich estimator based on White (1982) (se.method = 'robust'). Third, there are use approximate generalised least squares estimates based on the maximum likelihood estimates of the covariance parameters (se.method = 'approx'), or use a permutation test approach (se.method = 'perm'). Note that the permutation test can be accessed separately with the function permutation_test().

There are several options that can be specified to the function using the options argument. The options should be provided as a list, e.g. options = list(method="mcnr"). The possible options are:

b_se_only TRUE (calculate standard errors of the mean function parameters only) or FALSE (calculate all standard errors), default it FALSE.
use_cmdstanr TRUE (uses cmdstanr for the MCMC sampling, requires cmdstanr), or FALSE (uses rstan). Default is FALSE.
skip_cov_optim TRUE (skips the covariance parameter estimation step, and uses the values covariance$parameters), or FALSE (run the whole algorithm)], default is FALSE
sim_lik_step TRUE (conduct a simulated likelihood step at the end of the algorithm), or FALSE (does not do this step), defaults to FALSE.
no_warnings TRUE (do not report any warnings) or FALSE (report warnings), default to FALSE
perm_type Either cov (use weighted test statistic in permutation test) or unw (use unweighted test statistic), defaults to cov. See permutation_test().
perm_iter Number of iterations for the permutation test, default is 100.
perm_parallel TRUE (run permuation test in parallel) or FALSE (runs on a single thread), default to TRUE
warmup_iter Number of warmup iterations on each iteration for the MCMC sampler, default is 500
perm_ci_steps Number of steps for the confidence interval search procedure if using the permutation test, default is 1000. See permutation_test().
fd_tol The tolerance of the first difference method to estimate the Hessian and Gradient, default is 1e-4.

Permutation tests If the user provided a re-randomisation function to the linked mean function object (see MeanFunction), then a permuation test can be conducted. A new random assignment is generated on each iteration of the permutation test. The test statistic can be either a quasi-score statistic, weighting the observations using the covariance matrix (type="cov"), or an unweighted statistic that weights each observation in each cluster equally (type="unw"). The 1-alpha% confidence interval limits are estimated using an efficient iterative stochastic search procedure. On each step of the algorithm a single permuation and test statistic is generated, and the current estimate of the confidence interval limit either increased or decreased depedning on its value. The procedure converges in probability to the true limits, see Watson et al (2021) and Garthwaite (1996).

MCMC Draws samples from the posterior distribution of the model parameters using Stan. Priors are specified using the priors argument. Currently, only Gaussian (or half-Gaussian for covariance parameters) prior distributions are supported. The argument priors accepts a list with three or four elements, prior_b_mean, prior_b_sd which are vectors specifying the prior mean and standard deviation for the mean function parameters β in the model; prior_g_sd specifying the prior standard deviation for the half-Gaussian prior for the covariance parameters, and optionally sigma_sd for the half-Gaussian prior for the scale terms in models that have a scale parameter (Gaussian and Gamma currently). By default the function uses rstan, however the function can optionally call cmdstanr instead if it is installed, using the option use_cmdstanr=TRUE. For further details on the use and arguments that can be used, see sampling.

Bayesian design analysis The Bayesian design analysis conducts multiple iterations of simulating data from a GLMM and then fitting a GLMM to analyse the properties of the posterior distribution and model calibration to inform study design. Data are either simulated from the same model as the analysis model, i.e. there is correct model specification (nothing is specified for the option sim_design), or data are simulated from a different model (a Design object passed to the sim_design argument of this function) and then fit using the Design object from which this function was called, i.e. with model misspecification. The function analyses four related statistics summarising the design. First, the expected posterior variance; second, the variance of the posterior variance; third, the probability the parameter of interest will be greater than some threshold; and fourth, simulation-based calibration (see Talts et al, 2021).

Model fitting is done using Stan's sampler (see sampling and the MCMC() function in this class).

Public fields

covariance: A Covariance object defining the random effects covariance.
mean_function: A MeanFunction object, defining the mean function for the model, including the data and covariate design matrix X.
exp_condition: A vector indicting the unique experimental conditions for each observation, see Details.
Sigma: The overall covariance matrix for the observations. Calculated and updated automatically as W^{-1} + ZDZ^T where W is an n x n diagonal matrix with elements on the diagonal equal to the GLM iterated weights. See Details.
var_par: Scale parameter required for some distributions (Gaussian, Gamma, Beta).

Methods

Public methods

Design$fitted()
Design$new()
Design$print()
Design$n()
Design$n_cluster()
Design$analysis()
Design$power()
Design$subset_rows()
Design$subset_cols()
Design$plot()
Design$sim_data()
Design$check()
Design$MCML()
Design$dfbeta()
Design$permutation_test()
Design$MCMC()
Design$analysis_bayesian()
Design$clone()

Method `fitted()`

Return predicted values based on the currently stored parameter values in mean_function

Usage

Design$fitted(type = "link")

Arguments

type: One of either "link" for values on the scale of the link function, or "response" for values on the scale of the response

Returns

A Matrix class object containing the predicted values

Method `new()`

Create a new Design object

Usage

Design$new(
  covariance,
  mean.function,
  var_par = NULL,
  verbose = TRUE,
  skip.sigma = FALSE
)

Arguments

covariance: Either a Covariance object, or an equivalent list of arguments that can be passed to Covariance to create a new object.
mean.function: Either a MeanFunction object, or an equivalent list of arguments that can be passed to MeanFunction to create a new object.
var_par: Scale parameter required for some distributions, including Gaussian. Default is NULL.
verbose: Logical indicating whether to provide detailed output
skip.sigma: Logical indicating whether to skip the creating of the covariance matrix Sigma. For very large designs with thousands of observations or more, the covariance matrix will be too big to fit in memory, so this option will prevent sigma being created.

Returns

A new Design class object

Examples

#create a data frame describing a cross-sectional parallel cluster
#randomised trial
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)

#alternatively we can pass the data directly to Design
#here we will specify a cohort study
df <- nelder(~ind(20) > t(6))
df$int <- 0
df[df$t > 3, 'int'] <- 1

des <- Design$new(
covariance = list(
  data=df,
  formula = ~ (1|ar1(t)*gr(ind)),
  parameters = c(0.8,1)),
mean.function = list(
  formula = ~int + factor(t),
  data=df,
  parameters = rep(0,7),
  family = poisson()))
                  
#an example of a spatial grid with two time points
df <- nelder(~ (x(10)*y(10))*t(2))
spt_design <- Design$new(covariance = list(data=df,
                                           formula = ~(1|fexp(x,y)*ar1(t)),
                                           parameters =c(0.2,0.1,0.8)),
                         mean.function = list(data=df,
                                              formula = ~ 1,
                                              parameters = c(0.5),
                                              family=poisson()))

Method `print()`

Print method for Design class

Usage

Design$print()

Arguments

...: ignored

Method `n()`

Returns the number of observations in the model

Usage

Design$n(...)

Arguments

...: ignored

Method `n_cluster()`

Returns the number of clusters at each level

Usage

Design$n_cluster()

Arguments

...: ignored

Returns

A data frame with the level, number of clusters, and variables describing each level.

Examples

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
des$n_cluster() ## returns two levels of 10 and 50

Method `analysis()`

Run a design analysis of the model via simulation

Usage

Design$analysis(
  type,
  iter,
  par,
  alpha = 0.05,
  sim_design,
  parallel = TRUE,
  verbose = TRUE,
  options = list(),
  ...
)

Arguments

type: One of either sim_data (recalls saved data from a previous call to this function), sim (full simulation using MCML), or sim_approx ( uses GLS to approximate the MLE and standard errors)
iter: Integer. The number of iterations of the simulation to run
par: Integer. The parameter of interest for which design analysis statistics should be calculated. Refers to the column of X.
alpha: Numeric. The type I error rate.
sim_design: Optional. A different Design object that will be used to simulate data to allow for model misspecification.
parallel: Logical indicating whether to run the simulations in parallel
verbose: Logical indicating whether to report detailed output. Defaults to TRUE.
options: Optional. A named list to pass to the options argument of MCML
...: Additional arguments passed to MCML, see below.

Returns

A glmmr.sim object containing the estimates from all the simulations, including standard errors, deletion diagnostic statistics, and details about the simulation.

Examples

\dontrun{
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
# analysis using MCML mcem algorithm
test1 <- des$analysis(type="sim",
                     iter=100,
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     method = "mcem",
                     m = 100)
#an analysis using the permutation test option and MCNR
test2 <- des$analysis(type="sim",
                     iter=100,
                     se.method="perm",
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     options = list(
                       perm_type="unw", perm_iter=100,
                       perm_parallel=FALSE,perm_ci_steps=1000)
                     method = "mcnr",
                     m = 100)
#returning previously saved sim data
test3 <- des$analysis(type="sim_data")

#to test model misspecification we can simulate from a different model
cov2 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)*ar1(t)),
  parameters = c(0.25,0.8)
)
des2 <- Design$new(
  covariance = cov2,
  mean.function = mf1,
  var_par = 1
)    
 test4 <- des$analysis(type="sim",
                     iter=100,
                     sim_design = des2,
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     method = "mcem",
                     m = 100)                       
}

Method `power()`

Approximate power of the design using the GLS variance

Usage

Design$power(par, value, alpha = 0.05)

Arguments

par: Integer indicating which parameter of the design the power should be calculated for. Refers to the order of parameters and column of X
value: Numeric specifying the value of the parameter to calculate the power at
alpha: Numeric between zero and one indicating the type I error rate. Default of 0.05.

Returns

A value between zero and one indicating the approximate power of the design.

Examples

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
des$power(6,0.5) #power of 0.79

Method `subset_rows()`

Subsets the design keeping specified observations only

Given a vector of row indices, the corresponding rows will be kept and the other rows will be removed from the mean function and covariance

Usage

Design$subset_rows(index)

Arguments

index: Integer or vector integers listing the rows to keep

Returns

The function updates the object and nothing is returned

Examples

#generate a stepped wedge design and remove the first sequence
des <- stepped_wedge(8,10,icc=0.05)
ids_to_keep <- which(des$mean_function$data$J!=1)
des$subset_rows(ids_to_keep)

Method `subset_cols()`

Subsets the columns of the design

Removes the specified columns from the linked mean function object's X matrix.

Usage

Design$subset_cols(index)

Arguments

index: Integer or vector of integers specifying the indexes of the columns to keep

Returns

The function updates the object and nothing is returned

Examples

#generate a stepped wedge design and remove first and last time periods
des <- stepped_wedge(8,10,icc=0.05)
des$subset_cols(c(2:8,10))

Method `plot()`

Generates a plot of the design

Generates a 'bubble' plot of the design with respect to two or three variables in which the size of the points at each location are scaled by the number of observations at that location. For example, for a cluster randomised trial the user might specify time period on the x-axis and cluster ID on the y-axis. For a geospatial sampling design the x and y axes might represent spatial dimensions.

Usage

Design$plot(x, y, z = NULL, treat)

Arguments

x: String naming a column in the data frame in the linked covariance object (self$covariance$data) for the x-axis
y: String naming a column in the data frame in the linked covariance object (self$covariance$data) for the y-axis
z: Optional. String naming a column in the data frame in the linked covariance object (self$covariance$data) for a 'third axis' used for faceting
treat: String naming a column in the data frame in the linked mean function object (self$mean_function$data) that identifies the treatment status of the observations at each location, used to set the colour of the points in the plot

Returns

A ggplot2 plot

Examples

\dontrun{
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
des$plot(x="cl",y="t",treat="int")
}

Method `sim_data()`

Generates a realisation of the design

Generates a single vector of outcome data based upon the specified GLMM design

Usage

Design$sim_data(type = "y")

Arguments

type: Either 'y' to return just the outcome data, or 'data' to return a data frame with the simulated outcome data alongside the model data

Returns

Either a vector or a data frame

Examples

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()

Method `check()`

Checks for any changes in linked objects and updates.

Checks for any changes in any object and updates all linked objects if any are detected. Generally called automatically.

Usage

Design$check(verbose = TRUE)

Arguments

verbose: Logical indicating whether to report if any updates are made, defaults to TRUE

Returns

Linked objects are updated by nothing is returned

Examples

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
des$check() #does nothing
des$covariance$parameters <- c(0.1,0.9)
des$check() #updates 
des$mean_function$parameters <- c(rnorm(5),0.1)
des$check() #updates

Method `MCML()`

Markov Chain Monte Carlo Maximum Likelihood model fitting

Usage

Design$MCML(
  y,
  start,
  se.method = "lik",
  method = "mcnr",
  permutation.par,
  verbose = TRUE,
  tol = 0.01,
  m = 100,
  max.iter = 30,
  options = list()
)

Arguments

y: A numeric vector of outcome data
start: Optional. A numeric vector indicating starting values for the MCML algorithm iterations. If this is not specified then the parameter values stored in the linked mean function object will be used.
se.method: One of either 'lik', 'approx', 'perm', or 'none', see Details.
method: The MCML algorithm to use, either mcem or mcnr, see Details. Default is mcem.
permutation.par: Optional. Integer specifing the index of the parameter if permutation tests are being used.
verbose: Logical indicating whether to provide detailed output, defaults to TRUE.
tol: Numeric value, tolerance of the MCML algorithm, the maximum difference in parameter estimates between iterations at which to stop the algorithm.
m: Integer. The number of MCMC draws of the random effects on each iteration.
max.iter: Integer. The maximum number of iterations of the MCML algorithm.
options: An optional list providing options to the algorithm, see Details.

Returns

A mcml object

Examples

\dontrun{
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
# fits the models using MCEM but does not estimate standard errors
fit1 <- des$MCML(y = ysim,
  se.method = "none")
#fits the models using MCNR but does not estimate standard errors
fit2 <- des$MCML(y = ysim,
  se.method = "none",
  method="mcnr")
#fits the models and uses permutation tests for parameter of interest
fit3 <- des$MCML(y = ysim,
  se.method = "perm",
  permutation.par = 6,
  options = list(
    perm_type="unw",
    perm_iter=1000,
    perm_parallel=FALSE,
    perm_ci_steps=1000
  ))
 #adds a simulated likelihood step after the MCEM algorithm
fit4 <- des$MCML(y = des$sim_data(),
  se.method = "none",
  options = list(
    sim_lik_step=TRUE
  ))  
}

Method `dfbeta()`

Calculates DFBETA deletion diagnostic values

Calculates the DFBETA deletion diagnostic statistic for each observation for the specified parameter.

Usage

Design$dfbeta(y, par)

Arguments

y: Numeric vector of outcomes data
par: Integer indicating which parameter, as a column of X, to report the deletion diagnostics for.

Returns

A vector of length self$n() with the value of DFBETA for each observation for the specified parameter

Examples

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
des$dfbeta(ysim,6)

Method `permutation_test()`

Conducts a permuation test

Estimates p-values and confidence intervals using a permutation test

Usage

Design$permutation_test(
  y,
  permutation.par,
  start,
  iter = 1000,
  nsteps = 1000,
  type = "cov",
  parallel = TRUE,
  verbose = TRUE
)

Arguments

y: Numeric vector of outcome data
permutation.par: Integer indicator which parameter to conduct a permutation test for. Refers to a column of the X matrix.
start: Value of the parameter. Used both as a starting value for the algorithms and as a best estimate for the confidence interval search procedure.
iter: Integer. Number of iterations of the permuation test to conduct
nsteps: Integer. Number of steps of the confidence interval search procedure
type: Either cov for a test statistic weighted by the covariance matrix, or unw for an unweighted test statistic. See Details.
parallel: Logical indicating whether to run the tests in parallel
verbose: Logical indicating whether to report detailed output

Returns

A list with the estimated p-value and the estimated lower and upper 95% confidence interval

Examples

\dontrun{
 df <- nelder(~(cl(6)*t(5)) > ind(5))
 df$int <- 0
 df[df$cl > 3, 'int'] <- 1
 #generate function that produces random allocations
 treatf <- function(){
             tr <- sample(rep(c(0,1),each=3),6,replace = FALSE)
             rep(tr,each=25)
             }
 mf1 <- MeanFunction$new(
   formula = ~ factor(t) + int - 1,
   data=df,
   parameters = c(rep(0,5),0.6),
   family =gaussian(),
   treat_var = "int",
   random_function = treatf)
 cov1 <- Covariance$new(
   data = df,
   formula = ~ (1|gr(cl)),
   parameters = c(0.25))
 des <- Design$new(
   covariance = cov1,
   mean.function = mf1,
   var_par = 1)
 #run MCML to get parameter estimate:
 fit1 <- des$MCML(y = ysim,
  se.method = "none")
 perm1 <- des$permutation_test(
   y=ysim,
   permutation.par=6,
   start = fit1$coefficients$est[6],
   type="unw",
   iter = 1000,
   nsteps = 1000) 
}

Method `MCMC()`

Fit the GLMM using MCMC

Fit the GLMM using MCMC. The function calls a Stan program to draw posterior samples.

Usage

Design$MCMC(
  y,
  priors,
  warmup_iter = 1000,
  sampling_iter = 1000,
  chains = 4,
  use_cmdstanr = FALSE,
  parallel = TRUE,
  ...
)

Arguments

y: Numeric vector providing the outcome data
priors: A named list specifying the prior mean and standard deviation for the Gaussian prior distributions, see Details.
warmup_iter: A positive integer specifying the number of warmup iterations for each MCMC chain
sampling_iter: A positive integer specifying the number of sampling iterations for each MCMC chain
chains: A positive integer specifying the number of MCMC chains to run
use_cmdstanr: Logical indicating whether to use cmdstanr, the default is FALSE, which will use rstan
parallel: Logical indicating whether to run the chains in parallel
...: Additional arguments to pass to sampling.

Returns

Either an S4 stanfit object returned by sampling, or a cmdstanr environment, depending on the sampler used.

Examples

\dontrun{
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
# fits the model with default priors 
fit1 <- des$MCMC(y=ysim)
#specify priors
fit2 <- des$MCMC(y=ysim,
  priors = list(
    prior_b_mean = rep(0,6),
    prior_b_sd = c(rep(3,5),1),
    prior_g_sd = rep(1,2)))
}

Method `analysis_bayesian()`

Bayesian design analysis

Runs a Bayesian design analysis using simulated data

Usage

Design$analysis_bayesian(
  iter,
  par,
  priors,
  threshold = 0,
  sim_design,
  priors_sim,
  warmup_iter = 200,
  sampling_iter = 200,
  chains = 1,
  parallel = TRUE,
  verbose = TRUE,
  use_cmdstanr = FALSE,
  ...
)

Arguments

iter: A positive integer specifying the number of simulation iterations to run
par: A positive interger specifying the index of the parameter in the mean function to analyse, refers specifically to the column of X
priors: A named list specifying the priors, see Details in the MCMC() function in this class
threshold: A number specifying the threshold. The probability that the parameter of interest is greater than this threshold will be calculated.
sim_design: Optional. A different Design object that should be used to simulate the data for the simulation.
priors_sim: Optional. A named list of the same structure as priors the specifies the priors from which to simulate data, if they are different to the priors for model fitting.
warmup_iter: A positive integer specifying the number of warmup iterations for each MCMC chain when fitting the model on each simulation iteration.
sampling_iter: A positive integer specifying the number of sampling iterations for each MCMC chain when fitting the model on each simulation iteration.
chains: A positive integer specifying the number of MCMC chains when fitting the model on each simulation iteration.
parallel: Logical indicating whether to run the MCMC chains in parallel
verbose: Logical indicating whether to provide detailed output of the progress of simulations.
use_cmdstanr: Logical indicating whether to use cmdstanr instead of rstan, the default is FALSE (i.e. use rstan)
...: Additional options to pass to rstan::sampling or cmdstanr::sample

Returns

A glmmr.sim object containing the samples of posterior variance and relevant probabilities to summarise the analysis

Examples

\dontrun{
#' df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
test <- des$analysis_bayesian(iter=10,
  par=6,
  warmup_iter = 200,
  sampling_iter = 500,
  parallel = FALSE,
  verbose=FALSE)
#to examine misspecification we can sample from a different model 
des2 <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)),
    parameters = c(0.25)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
test2 <- des$analysis_bayesian(iter=10,
  par=6,
  sim_design=des2,
  warmup_iter = 200,
  sampling_iter = 500,
  parallel = FALSE,
  verbose=FALSE)
}

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Design$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Braun and Feng McCullagh Stan McCullagh and Nelder Approx GLMMs paper Watson confidence interval

Watson et al. Arxiv Braun and Feng Gail Garthwaite

SBC paper (Talts) APV paper

Examples


## ------------------------------------------------
## Method `Design$new`
## ------------------------------------------------

#create a data frame describing a cross-sectional parallel cluster
#randomised trial
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)

#alternatively we can pass the data directly to Design
#here we will specify a cohort study
df <- nelder(~ind(20) > t(6))
df$int <- 0
df[df$t > 3, 'int'] <- 1

des <- Design$new(
covariance = list(
  data=df,
  formula = ~ (1|ar1(t)*gr(ind)),
  parameters = c(0.8,1)),
mean.function = list(
  formula = ~int + factor(t),
  data=df,
  parameters = rep(0,7),
  family = poisson()))
                  
#an example of a spatial grid with two time points
df <- nelder(~ (x(10)*y(10))*t(2))
spt_design <- Design$new(covariance = list(data=df,
                                           formula = ~(1|fexp(x,y)*ar1(t)),
                                           parameters =c(0.2,0.1,0.8)),
                         mean.function = list(data=df,
                                              formula = ~ 1,
                                              parameters = c(0.5),
                                              family=poisson())) 

## ------------------------------------------------
## Method `Design$n_cluster`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
des$n_cluster() ## returns two levels of 10 and 50

## ------------------------------------------------
## Method `Design$analysis`
## ------------------------------------------------

## Not run: 
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
# analysis using MCML mcem algorithm
test1 <- des$analysis(type="sim",
                     iter=100,
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     method = "mcem",
                     m = 100)
#an analysis using the permutation test option and MCNR
test2 <- des$analysis(type="sim",
                     iter=100,
                     se.method="perm",
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     options = list(
                       perm_type="unw", perm_iter=100,
                       perm_parallel=FALSE,perm_ci_steps=1000)
                     method = "mcnr",
                     m = 100)
#returning previously saved sim data
test3 <- des$analysis(type="sim_data")

#to test model misspecification we can simulate from a different model
cov2 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)*ar1(t)),
  parameters = c(0.25,0.8)
)
des2 <- Design$new(
  covariance = cov2,
  mean.function = mf1,
  var_par = 1
)    
 test4 <- des$analysis(type="sim",
                     iter=100,
                     sim_design = des2,
                     par=6,
                     parallel = FALSE,
                     verbose = FALSE,
                     method = "mcem",
                     m = 100)                       

## End(Not run)

## ------------------------------------------------
## Method `Design$power`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

mf1 <- MeanFunction$new(
  formula = ~ factor(t) + int - 1,
  data=df,
  parameters = c(rep(0,5),0.6),
  family = gaussian()
)
cov1 <- Covariance$new(
  data = df,
  formula = ~ (1|gr(cl)) + (1|gr(cl*t)),
  parameters = c(0.25,0.1)
)
des <- Design$new(
  covariance = cov1,
  mean.function = mf1,
  var_par = 1
)
des$power(6,0.5) #power of 0.79

## ------------------------------------------------
## Method `Design$subset_rows`
## ------------------------------------------------

#generate a stepped wedge design and remove the first sequence
des <- stepped_wedge(8,10,icc=0.05)
ids_to_keep <- which(des$mean_function$data$J!=1)
des$subset_rows(ids_to_keep)

## ------------------------------------------------
## Method `Design$subset_cols`
## ------------------------------------------------

#generate a stepped wedge design and remove first and last time periods
des <- stepped_wedge(8,10,icc=0.05)
des$subset_cols(c(2:8,10))

## ------------------------------------------------
## Method `Design$plot`
## ------------------------------------------------

## Not run: 
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
des$plot(x="cl",y="t",treat="int")

## End(Not run)

## ------------------------------------------------
## Method `Design$sim_data`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()

## ------------------------------------------------
## Method `Design$check`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
des$check() #does nothing
des$covariance$parameters <- c(0.1,0.9)
des$check() #updates 
des$mean_function$parameters <- c(rnorm(5),0.1)
des$check() #updates 

## ------------------------------------------------
## Method `Design$MCML`
## ------------------------------------------------

## Not run: 
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
# fits the models using MCEM but does not estimate standard errors
fit1 <- des$MCML(y = ysim,
  se.method = "none")
#fits the models using MCNR but does not estimate standard errors
fit2 <- des$MCML(y = ysim,
  se.method = "none",
  method="mcnr")
#fits the models and uses permutation tests for parameter of interest
fit3 <- des$MCML(y = ysim,
  se.method = "perm",
  permutation.par = 6,
  options = list(
    perm_type="unw",
    perm_iter=1000,
    perm_parallel=FALSE,
    perm_ci_steps=1000
  ))
 #adds a simulated likelihood step after the MCEM algorithm
fit4 <- des$MCML(y = des$sim_data(),
  se.method = "none",
  options = list(
    sim_lik_step=TRUE
  ))  

## End(Not run)

## ------------------------------------------------
## Method `Design$dfbeta`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
des$dfbeta(ysim,6)

## ------------------------------------------------
## Method `Design$permutation_test`
## ------------------------------------------------

## Not run: 
 df <- nelder(~(cl(6)*t(5)) > ind(5))
 df$int <- 0
 df[df$cl > 3, 'int'] <- 1
 #generate function that produces random allocations
 treatf <- function(){
             tr <- sample(rep(c(0,1),each=3),6,replace = FALSE)
             rep(tr,each=25)
             }
 mf1 <- MeanFunction$new(
   formula = ~ factor(t) + int - 1,
   data=df,
   parameters = c(rep(0,5),0.6),
   family =gaussian(),
   treat_var = "int",
   random_function = treatf)
 cov1 <- Covariance$new(
   data = df,
   formula = ~ (1|gr(cl)),
   parameters = c(0.25))
 des <- Design$new(
   covariance = cov1,
   mean.function = mf1,
   var_par = 1)
 #run MCML to get parameter estimate:
 fit1 <- des$MCML(y = ysim,
  se.method = "none")
 perm1 <- des$permutation_test(
   y=ysim,
   permutation.par=6,
   start = fit1$coefficients$est[6],
   type="unw",
   iter = 1000,
   nsteps = 1000) 

## End(Not run)

## ------------------------------------------------
## Method `Design$MCMC`
## ------------------------------------------------

## Not run: 
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
ysim <- des$sim_data()
# fits the model with default priors 
fit1 <- des$MCMC(y=ysim)
#specify priors
fit2 <- des$MCMC(y=ysim,
  priors = list(
    prior_b_mean = rep(0,6),
    prior_b_sd = c(rep(3,5),1),
    prior_g_sd = rep(1,2)))

## End(Not run)

## ------------------------------------------------
## Method `Design$analysis_bayesian`
## ------------------------------------------------

## Not run: 
#' df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)*ar1(t)),
    parameters = c(0.25,0.8)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
test <- des$analysis_bayesian(iter=10,
  par=6,
  warmup_iter = 200,
  sampling_iter = 500,
  parallel = FALSE,
  verbose=FALSE)
#to examine misspecification we can sample from a different model 
des2 <- Design$new(
  covariance = list(
    data = df,
    formula = ~ (1|gr(cl)),
    parameters = c(0.25)),
  mean.function = list(
    formula = ~ factor(t) + int - 1,
    data=df,
    parameters = c(rep(0,5),0.6),
    family = binomial())
)
test2 <- des$analysis_bayesian(iter=10,
  par=6,
  sim_design=des2,
  warmup_iter = 200,
  sampling_iter = 500,
  parallel = FALSE,
  verbose=FALSE)

## End(Not run)

samuel-watson/glmmr documentation built on July 27, 2022, 10:30 p.m.

samuel-watson/glmmr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

samuel-watson/glmmr Design and analysis for generalised linear mixed models in R

Design: A GLMM Design In samuel-watson/glmmr: Design and analysis for generalised linear mixed models in R

A GLMM Design

Description

Details

Public fields

Methods

Public methods

Method fitted()

Usage

Arguments

Returns

Method new()

Usage

Arguments

Returns

Examples

Method print()

Usage

Arguments

Method n()

Usage

Arguments

Method n_cluster()

Usage

Arguments

Returns

Examples

Method analysis()

Usage

Arguments

Returns

Examples

Method power()

Usage

Arguments

Returns

Examples

Method subset_rows()

Usage

Arguments

Returns

Examples

Method subset_cols()

Usage

Arguments

Returns

Examples

Method plot()

Usage

Arguments

Returns

Examples

Method sim_data()

Usage

Arguments

Returns

Examples

Method check()

Usage

Arguments

Returns

Examples

Method MCML()

Usage

Arguments

Returns

Examples

Method dfbeta()

Usage

Arguments

Returns

Examples

Method permutation_test()

Usage

Arguments

Returns

Examples

Method MCMC()

Usage

samuel-watson/glmmr
Design and analysis for generalised linear mixed models in R

Design: A GLMM Design
In samuel-watson/glmmr: Design and analysis for generalised linear mixed models in R

Method `fitted()`

Method `new()`

Method `print()`

Method `n()`

Method `n_cluster()`

Method `analysis()`

Method `power()`

Method `subset_rows()`

Method `subset_cols()`

Method `plot()`

Method `sim_data()`

Method `check()`

Method `MCML()`

Method `dfbeta()`

Method `permutation_test()`

Method `MCMC()`

Method `analysis_bayesian()`

Method `clone()`