Design | R Documentation |
A GLMM Design
A GLMM Design
An R6 class representing a GLMM and study design
For the generalised linear mixed model
Y \sim F(μ,σ)
μ = h^-1(Xβ + Zγ)
γ \sim MVN(0,D)
where h is the link function. A Design in comprised of a MeanFunction object, which defines the family F, link function h, and fixed effects design matrix X, and a Covariance object, which defines Z and D. The class provides methods for analysis and simulation with these models.
This class provides methods for: data simulation (sim_data()
and fitted()
), model fitting using Markov Chain
Monte Carlo Maximum Likelihood (MCML) methods (MCML()
), design analysis via simulation including power (analysis()
),
deletion diagnostics (dfbeta()
), and permutation tests including p-values and confidence intervals (permutation()
).
The class by default calculates the covariance matrix of the observations as:
Σ = W^{-1} + ZDZ^T
where W is a diagonal matrix with the WLS iterated weights for each observation equal to, for individual i φ a_i v(μ_i)[h'(μ_i)]^2 (see Table 2.1 in McCullagh and Nelder (1989) ISBN:9780412317606). For very large designs, this can be disabled as the memory requirements can be prohibitive.
Calls the respective print methods of the linked covariance and mean function objects.
The matrices X and Z both have n rows, where n is the number of observations in the model/design.
Number of clusters Returns a data frame describing the number of independent clusters or groups at each level in the design. For example, if there were cluster-periods nested in clusters, then the top level would be clusters, and the second level would be cluster periods.
analysis
The analysis function conducts a detailed design analysis using the analysis
model specified by the object. Data are simulated either using the same
data generating process, or using a different Design object specified by
the user to allow for model misspecification. On each iteration the model
is estimated with the simulated data y using either the MCML
function or
approximate parameters and standard errors using generalised least squares.
MCML is an exact maximum likelihood algorithm, and can be slow, so results
from previous simulations are saved in the design object and can be recalled
later. Deletion diagnostics are also calculated to calculate influential parts of
the design, although these are typically not useful for balanced
experimental designs.
The function returns an glmmr.sim
object, which estimates and summarises:
Model fitting and simulation diagnostics Convergence failure percent and coverage. Maximum likelihood estimators for GLMMs can fail to converge or reach the MLE. GLMMs can also be susceptible to small sample biases where the relevant sample size is at the level of clustering and correlation.
Error rates Type 2 (power), Type S (significance), and Type M (magnitude) errors are reported.
p-value and confidence interval width distributions
Deletion diagnostics For unbalanced designs and under model misspecifications, certain parts of the design may have more influence than others over the estimate of interest, or have a larger than desired effect. A summary of the DFBETA diagnostic is provided.
Approximate power Calculates the approximate power of the design using the square root of the relevant element of the GLS variance matrix:
(X^TΣ^{-1}X)^{-1}
Note that this is equivalent to using the "design effect" for many models.
MCMCML Fits generalised linear mixed models using one of three algorithms: Markov Chain Newton Raphson (MCNR), Markov Chain Expectation Maximisation (MCEM), or Maximum simulated likelihood (MSL). All the algorithms are described by McCullagh (1997). For each iteration of the algorithm the unobserved random effect terms (γ) are simulated using Markov Chain Monte Carlo (MCMC) methods (we use Hamiltonian Monte Carlo through Stan), and then these values are conditioned on in the subsequent steps to estimate the covariance parameters and the mean function parameters (β). For all the algorithms, the covariance parameter estimates are updated using an expectation maximisation step. For the mean function parameters you can either use a Newton Raphson step (MCNR) or an expectation maximisation step (MCEM). A simulated likelihood step can be added at the end of either MCNR or MCEM, which uses an importance sampling technique to refine the parameter estimates.
The accuracy of the algorithm depends on the user specified tolerance. For higher levels of tolerance, larger numbers of MCMC samples are likely need to sufficiently reduce Monte Carlo error.
The function also offers different methods of obtaining standard errors. First, one can generate
estimates from the estimated Hessian matrix (se.method = 'lik'
). Second, there are robust standard
errors using a sandwich estimator based on White (1982) (se.method = 'robust'
).
Third, there are use approximate generalised least squares estimates based on the maximum likelihood
estimates of the covariance
parameters (se.method = 'approx'
), or use a permutation test approach (se.method = 'perm'
).
Note that the permutation test can be accessed separately with the function permutation_test()
.
There are several options that can be specified to the function using the options
argument.
The options should be provided as a list, e.g. options = list(method="mcnr")
. The possible options are:
b_se_only
TRUE (calculate standard errors of the mean function parameters only) or FALSE (calculate
all standard errors), default it FALSE.
use_cmdstanr
TRUE (uses cmdstanr
for the MCMC sampling, requires cmdstanr), or FALSE (uses rstan
). Default is FALSE.
skip_cov_optim
TRUE (skips the covariance parameter estimation step, and uses the values covariance$parameters), or
FALSE (run the whole algorithm)], default is FALSE
sim_lik_step
TRUE (conduct a simulated likelihood step at the end of the algorithm), or FALSE (does
not do this step), defaults to FALSE.
no_warnings
TRUE (do not report any warnings) or FALSE (report warnings), default to FALSE
perm_type
Either cov
(use weighted test statistic in permutation test) or unw
(use unweighted
test statistic), defaults to cov
. See permutation_test()
.
perm_iter
Number of iterations for the permutation test, default is 100.
perm_parallel
TRUE (run permuation test in parallel) or FALSE (runs on a single thread), default to TRUE
warmup_iter
Number of warmup iterations on each iteration for the MCMC sampler, default is 500
perm_ci_steps
Number of steps for the confidence interval search procedure if using the permutation
test, default is 1000. See permutation_test()
.
fd_tol
The tolerance of the first difference method to estimate the Hessian and Gradient, default
is 1e-4.
Permutation tests
If the user provided a re-randomisation function to the linked mean function object (see MeanFunction),
then a permuation test can be conducted. A new random assignment is generated on each iteration of the permutation test.
The test statistic can be either a quasi-score statistic, weighting the observations using the covariance matrix (type="cov"
),
or an unweighted statistic that weights each observation in each cluster equally (type="unw"
). The 1-alpha%
confidence interval limits are estimated using an efficient iterative stochastic search procedure. On each step of the algorithm
a single permuation and test statistic is generated, and the current estimate of the confidence interval limit either
increased or decreased depedning on its value. The procedure converges in probability to the true limits, see Watson et al (2021)
and Garthwaite (1996).
MCMC
Draws samples from the posterior distribution of the model parameters using Stan. Priors are specified using the priors
argument.
Currently, only Gaussian (or half-Gaussian for covariance parameters) prior distributions are supported. The argument priors
accepts a list with three or four elements, prior_b_mean
, prior_b_sd
which are vectors specifying the prior mean and
standard deviation for the mean function parameters β in the model; prior_g_sd
specifying the prior standard deviation
for the half-Gaussian prior for the covariance parameters, and optionally sigma_sd
for the half-Gaussian prior for the scale
terms in models that have a scale parameter (Gaussian and Gamma currently). By default the function uses rstan
, however the function
can optionally call cmdstanr
instead if it is installed, using the option use_cmdstanr=TRUE
. For further details on the use and
arguments that can be used, see sampling.
Bayesian design analysis
The Bayesian design analysis conducts multiple iterations of simulating data from a GLMM and then fitting a GLMM to analyse
the properties of the posterior distribution and model calibration to inform study design. Data are either simulated from the same
model as the analysis model, i.e. there is correct model specification (nothing is specified for the option sim_design
), or data
are simulated from a different model (a Design
object passed to the sim_design
argument of this function) and then fit using
the Design
object from which this function was called, i.e. with model misspecification. The function analyses four related statistics
summarising the design. First, the expected posterior variance; second, the variance of the posterior variance; third, the probability the
parameter of interest will be greater than some threshold; and fourth, simulation-based calibration (see Talts et al, 2021).
Model fitting is done using Stan's sampler (see sampling and the MCMC()
function in this class).
covariance
A Covariance object defining the random effects covariance.
mean_function
A MeanFunction object, defining the mean function for the model, including the data and covariate design matrix X.
exp_condition
A vector indicting the unique experimental conditions for each observation, see Details.
Sigma
The overall covariance matrix for the observations. Calculated and updated automatically as W^{-1} + ZDZ^T where W is an n x n diagonal matrix with elements on the diagonal equal to the GLM iterated weights. See Details.
var_par
Scale parameter required for some distributions (Gaussian, Gamma, Beta).
fitted()
Return predicted values based on the currently stored parameter values in mean_function
Design$fitted(type = "link")
type
One of either "link
" for values on the scale of the link function, or "response
"
for values on the scale of the response
A Matrix class object containing the predicted values
new()
Create a new Design object
Design$new( covariance, mean.function, var_par = NULL, verbose = TRUE, skip.sigma = FALSE )
covariance
Either a Covariance object, or an equivalent list of arguments
that can be passed to Covariance
to create a new object.
mean.function
Either a MeanFunction object, or an equivalent list of arguments
that can be passed to MeanFunction
to create a new object.
var_par
Scale parameter required for some distributions, including Gaussian. Default is NULL.
verbose
Logical indicating whether to provide detailed output
skip.sigma
Logical indicating whether to skip the creating of the covariance matrix Sigma. For very large designs with thousands of observations or more, the covariance matrix will be too big to fit in memory, so this option will prevent sigma being created.
A new Design class object
#create a data frame describing a cross-sectional parallel cluster #randomised trial df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = gaussian() ) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)) + (1|gr(cl*t)), parameters = c(0.25,0.1) ) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1 ) #alternatively we can pass the data directly to Design #here we will specify a cohort study df <- nelder(~ind(20) > t(6)) df$int <- 0 df[df$t > 3, 'int'] <- 1 des <- Design$new( covariance = list( data=df, formula = ~ (1|ar1(t)*gr(ind)), parameters = c(0.8,1)), mean.function = list( formula = ~int + factor(t), data=df, parameters = rep(0,7), family = poisson())) #an example of a spatial grid with two time points df <- nelder(~ (x(10)*y(10))*t(2)) spt_design <- Design$new(covariance = list(data=df, formula = ~(1|fexp(x,y)*ar1(t)), parameters =c(0.2,0.1,0.8)), mean.function = list(data=df, formula = ~ 1, parameters = c(0.5), family=poisson()))
print()
Print method for Design
class
Design$print()
...
ignored
n()
Returns the number of observations in the model
Design$n(...)
...
ignored
n_cluster()
Returns the number of clusters at each level
Design$n_cluster()
...
ignored
A data frame with the level, number of clusters, and variables describing each level.
df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = gaussian() ) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)) + (1|gr(cl*t)), parameters = c(0.25,0.1) ) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1 ) des$n_cluster() ## returns two levels of 10 and 50
analysis()
Run a design analysis of the model via simulation
Design$analysis( type, iter, par, alpha = 0.05, sim_design, parallel = TRUE, verbose = TRUE, options = list(), ... )
type
One of either sim_data
(recalls saved data from a previous
call to this function), sim
(full simulation using MCML), or sim_approx
(
uses GLS to approximate the MLE and standard errors)
iter
Integer. The number of iterations of the simulation to run
par
Integer. The parameter of interest for which design analysis statistics should be calculated. Refers to the column of X.
alpha
Numeric. The type I error rate.
sim_design
Optional. A different Design
object that will be used to
simulate data to allow for model misspecification.
parallel
Logical indicating whether to run the simulations in parallel
verbose
Logical indicating whether to report detailed output. Defaults to TRUE.
options
Optional. A named list to pass to the options argument of MCML
...
Additional arguments passed to MCML
, see below.
A glmmr.sim
object containing the estimates from all the simulations, including
standard errors, deletion diagnostic statistics, and details about the simulation.
\dontrun{ df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = gaussian() ) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)) + (1|gr(cl*t)), parameters = c(0.25,0.1) ) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1 ) # analysis using MCML mcem algorithm test1 <- des$analysis(type="sim", iter=100, par=6, parallel = FALSE, verbose = FALSE, method = "mcem", m = 100) #an analysis using the permutation test option and MCNR test2 <- des$analysis(type="sim", iter=100, se.method="perm", par=6, parallel = FALSE, verbose = FALSE, options = list( perm_type="unw", perm_iter=100, perm_parallel=FALSE,perm_ci_steps=1000) method = "mcnr", m = 100) #returning previously saved sim data test3 <- des$analysis(type="sim_data") #to test model misspecification we can simulate from a different model cov2 <- Covariance$new( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8) ) des2 <- Design$new( covariance = cov2, mean.function = mf1, var_par = 1 ) test4 <- des$analysis(type="sim", iter=100, sim_design = des2, par=6, parallel = FALSE, verbose = FALSE, method = "mcem", m = 100) }
power()
Approximate power of the design using the GLS variance
Design$power(par, value, alpha = 0.05)
par
Integer indicating which parameter of the design the power should be calculated for. Refers to the order of parameters and column of X
value
Numeric specifying the value of the parameter to calculate the power at
alpha
Numeric between zero and one indicating the type I error rate. Default of 0.05.
A value between zero and one indicating the approximate power of the design.
df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = gaussian() ) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)) + (1|gr(cl*t)), parameters = c(0.25,0.1) ) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1 ) des$power(6,0.5) #power of 0.79
subset_rows()
Subsets the design keeping specified observations only
Given a vector of row indices, the corresponding rows will be kept and the other rows will be removed from the mean function and covariance
Design$subset_rows(index)
index
Integer or vector integers listing the rows to keep
The function updates the object and nothing is returned
#generate a stepped wedge design and remove the first sequence des <- stepped_wedge(8,10,icc=0.05) ids_to_keep <- which(des$mean_function$data$J!=1) des$subset_rows(ids_to_keep)
subset_cols()
Subsets the columns of the design
Removes the specified columns from the linked mean function object's X matrix.
Design$subset_cols(index)
index
Integer or vector of integers specifying the indexes of the columns to keep
The function updates the object and nothing is returned
#generate a stepped wedge design and remove first and last time periods des <- stepped_wedge(8,10,icc=0.05) des$subset_cols(c(2:8,10))
plot()
Generates a plot of the design
Generates a 'bubble' plot of the design with respect to two or three variables in which the size of the points at each location are scaled by the number of observations at that location. For example, for a cluster randomised trial the user might specify time period on the x-axis and cluster ID on the y-axis. For a geospatial sampling design the x and y axes might represent spatial dimensions.
Design$plot(x, y, z = NULL, treat)
x
String naming a column in the data frame in the linked covariance object (self$covariance$data) for the x-axis
y
String naming a column in the data frame in the linked covariance object (self$covariance$data) for the y-axis
z
Optional. String naming a column in the data frame in the linked covariance object (self$covariance$data) for a 'third axis' used for faceting
treat
String naming a column in the data frame in the linked mean function object (self$mean_function$data) that identifies the treatment status of the observations at each location, used to set the colour of the points in the plot
A ggplot2 plot
\dontrun{ df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) des$plot(x="cl",y="t",treat="int") }
sim_data()
Generates a realisation of the design
Generates a single vector of outcome data based upon the specified GLMM design
Design$sim_data(type = "y")
type
Either 'y' to return just the outcome data, or 'data' to return a data frame with the simulated outcome data alongside the model data
Either a vector or a data frame
df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) ysim <- des$sim_data()
check()
Checks for any changes in linked objects and updates.
Checks for any changes in any object and updates all linked objects if any are detected. Generally called automatically.
Design$check(verbose = TRUE)
verbose
Logical indicating whether to report if any updates are made, defaults to TRUE
Linked objects are updated by nothing is returned
df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) des$check() #does nothing des$covariance$parameters <- c(0.1,0.9) des$check() #updates des$mean_function$parameters <- c(rnorm(5),0.1) des$check() #updates
MCML()
Markov Chain Monte Carlo Maximum Likelihood model fitting
Design$MCML( y, start, se.method = "lik", method = "mcnr", permutation.par, verbose = TRUE, tol = 0.01, m = 100, max.iter = 30, options = list() )
y
A numeric vector of outcome data
start
Optional. A numeric vector indicating starting values for the MCML algorithm iterations. If this is not specified then the parameter values stored in the linked mean function object will be used.
se.method
One of either 'lik'
, 'approx'
, 'perm'
, or 'none'
, see Details.
method
The MCML algorithm to use, either mcem
or mcnr
, see Details. Default is mcem
.
permutation.par
Optional. Integer specifing the index of the parameter if permutation tests are being used.
verbose
Logical indicating whether to provide detailed output, defaults to TRUE.
tol
Numeric value, tolerance of the MCML algorithm, the maximum difference in parameter estimates between iterations at which to stop the algorithm.
m
Integer. The number of MCMC draws of the random effects on each iteration.
max.iter
Integer. The maximum number of iterations of the MCML algorithm.
options
An optional list providing options to the algorithm, see Details.
A mcml
object
\dontrun{ df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) ysim <- des$sim_data() # fits the models using MCEM but does not estimate standard errors fit1 <- des$MCML(y = ysim, se.method = "none") #fits the models using MCNR but does not estimate standard errors fit2 <- des$MCML(y = ysim, se.method = "none", method="mcnr") #fits the models and uses permutation tests for parameter of interest fit3 <- des$MCML(y = ysim, se.method = "perm", permutation.par = 6, options = list( perm_type="unw", perm_iter=1000, perm_parallel=FALSE, perm_ci_steps=1000 )) #adds a simulated likelihood step after the MCEM algorithm fit4 <- des$MCML(y = des$sim_data(), se.method = "none", options = list( sim_lik_step=TRUE )) }
dfbeta()
Calculates DFBETA deletion diagnostic values
Calculates the DFBETA deletion diagnostic statistic for each observation for the specified parameter.
Design$dfbeta(y, par)
y
Numeric vector of outcomes data
par
Integer indicating which parameter, as a column of X, to report the deletion diagnostics for.
A vector of length self$n() with the value of DFBETA for each observation for the specified parameter
df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) ysim <- des$sim_data() des$dfbeta(ysim,6)
permutation_test()
Conducts a permuation test
Estimates p-values and confidence intervals using a permutation test
Design$permutation_test( y, permutation.par, start, iter = 1000, nsteps = 1000, type = "cov", parallel = TRUE, verbose = TRUE )
y
Numeric vector of outcome data
permutation.par
Integer indicator which parameter to conduct a permutation test for. Refers to a column of the X matrix.
start
Value of the parameter. Used both as a starting value for the algorithms and as a best estimate for the confidence interval search procedure.
iter
Integer. Number of iterations of the permuation test to conduct
nsteps
Integer. Number of steps of the confidence interval search procedure
type
Either cov
for a test statistic weighted by the covariance matrix, or
unw
for an unweighted test statistic. See Details.
parallel
Logical indicating whether to run the tests in parallel
verbose
Logical indicating whether to report detailed output
A list with the estimated p-value and the estimated lower and upper 95% confidence interval
\dontrun{ df <- nelder(~(cl(6)*t(5)) > ind(5)) df$int <- 0 df[df$cl > 3, 'int'] <- 1 #generate function that produces random allocations treatf <- function(){ tr <- sample(rep(c(0,1),each=3),6,replace = FALSE) rep(tr,each=25) } mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family =gaussian(), treat_var = "int", random_function = treatf) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)), parameters = c(0.25)) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1) #run MCML to get parameter estimate: fit1 <- des$MCML(y = ysim, se.method = "none") perm1 <- des$permutation_test( y=ysim, permutation.par=6, start = fit1$coefficients$est[6], type="unw", iter = 1000, nsteps = 1000) }
MCMC()
Fit the GLMM using MCMC
Fit the GLMM using MCMC. The function calls a Stan program to draw posterior samples.
Design$MCMC( y, priors, warmup_iter = 1000, sampling_iter = 1000, chains = 4, use_cmdstanr = FALSE, parallel = TRUE, ... )
y
Numeric vector providing the outcome data
priors
A named list specifying the prior mean and standard deviation for the Gaussian prior distributions, see Details.
warmup_iter
A positive integer specifying the number of warmup iterations for each MCMC chain
sampling_iter
A positive integer specifying the number of sampling iterations for each MCMC chain
chains
A positive integer specifying the number of MCMC chains to run
use_cmdstanr
Logical indicating whether to use cmdstanr
, the default is FALSE, which will use rstan
parallel
Logical indicating whether to run the chains in parallel
...
Additional arguments to pass to sampling.
Either an S4 stanfit
object returned by sampling, or a cmdstanr
environment, depending on the sampler used.
\dontrun{ df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) ysim <- des$sim_data() # fits the model with default priors fit1 <- des$MCMC(y=ysim) #specify priors fit2 <- des$MCMC(y=ysim, priors = list( prior_b_mean = rep(0,6), prior_b_sd = c(rep(3,5),1), prior_g_sd = rep(1,2))) }
analysis_bayesian()
Bayesian design analysis
Runs a Bayesian design analysis using simulated data
Design$analysis_bayesian( iter, par, priors, threshold = 0, sim_design, priors_sim, warmup_iter = 200, sampling_iter = 200, chains = 1, parallel = TRUE, verbose = TRUE, use_cmdstanr = FALSE, ... )
iter
A positive integer specifying the number of simulation iterations to run
par
A positive interger specifying the index of the parameter in the mean function to analyse, refers specifically to the column of X
priors
A named list specifying the priors, see Details in the MCMC()
function in this class
threshold
A number specifying the threshold. The probability that the parameter of interest is greater than this threshold will be calculated.
sim_design
Optional. A different Design
object that should be used to simulate the data for the simulation.
priors_sim
Optional. A named list of the same structure as priors
the specifies the priors from which to simulate data, if they are different
to the priors for model fitting.
warmup_iter
A positive integer specifying the number of warmup iterations for each MCMC chain when fitting the model on each simulation iteration.
sampling_iter
A positive integer specifying the number of sampling iterations for each MCMC chain when fitting the model on each simulation iteration.
chains
A positive integer specifying the number of MCMC chains when fitting the model on each simulation iteration.
parallel
Logical indicating whether to run the MCMC chains in parallel
verbose
Logical indicating whether to provide detailed output of the progress of simulations.
use_cmdstanr
Logical indicating whether to use cmdstanr
instead of rstan
, the default is FALSE (i.e. use rstan
)
...
Additional options to pass to rstan::sampling
or cmdstanr::sample
A glmmr.sim
object containing the samples of posterior variance and relevant probabilities to summarise the analysis
\dontrun{ #' df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) test <- des$analysis_bayesian(iter=10, par=6, warmup_iter = 200, sampling_iter = 500, parallel = FALSE, verbose=FALSE) #to examine misspecification we can sample from a different model des2 <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)), parameters = c(0.25)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) test2 <- des$analysis_bayesian(iter=10, par=6, sim_design=des2, warmup_iter = 200, sampling_iter = 500, parallel = FALSE, verbose=FALSE) }
clone()
The objects of this class are cloneable with this method.
Design$clone(deep = FALSE)
deep
Whether to make a deep clone.
Braun and Feng McCullagh Stan McCullagh and Nelder Approx GLMMs paper Watson confidence interval
Watson et al. Arxiv Braun and Feng Gail Garthwaite
SBC paper (Talts) APV paper
nelder, MeanFunction, Covariance
print.glmmr.sim, MCML
sampling, stan
## ------------------------------------------------ ## Method `Design$new` ## ------------------------------------------------ #create a data frame describing a cross-sectional parallel cluster #randomised trial df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = gaussian() ) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)) + (1|gr(cl*t)), parameters = c(0.25,0.1) ) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1 ) #alternatively we can pass the data directly to Design #here we will specify a cohort study df <- nelder(~ind(20) > t(6)) df$int <- 0 df[df$t > 3, 'int'] <- 1 des <- Design$new( covariance = list( data=df, formula = ~ (1|ar1(t)*gr(ind)), parameters = c(0.8,1)), mean.function = list( formula = ~int + factor(t), data=df, parameters = rep(0,7), family = poisson())) #an example of a spatial grid with two time points df <- nelder(~ (x(10)*y(10))*t(2)) spt_design <- Design$new(covariance = list(data=df, formula = ~(1|fexp(x,y)*ar1(t)), parameters =c(0.2,0.1,0.8)), mean.function = list(data=df, formula = ~ 1, parameters = c(0.5), family=poisson())) ## ------------------------------------------------ ## Method `Design$n_cluster` ## ------------------------------------------------ df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = gaussian() ) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)) + (1|gr(cl*t)), parameters = c(0.25,0.1) ) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1 ) des$n_cluster() ## returns two levels of 10 and 50 ## ------------------------------------------------ ## Method `Design$analysis` ## ------------------------------------------------ ## Not run: df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = gaussian() ) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)) + (1|gr(cl*t)), parameters = c(0.25,0.1) ) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1 ) # analysis using MCML mcem algorithm test1 <- des$analysis(type="sim", iter=100, par=6, parallel = FALSE, verbose = FALSE, method = "mcem", m = 100) #an analysis using the permutation test option and MCNR test2 <- des$analysis(type="sim", iter=100, se.method="perm", par=6, parallel = FALSE, verbose = FALSE, options = list( perm_type="unw", perm_iter=100, perm_parallel=FALSE,perm_ci_steps=1000) method = "mcnr", m = 100) #returning previously saved sim data test3 <- des$analysis(type="sim_data") #to test model misspecification we can simulate from a different model cov2 <- Covariance$new( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8) ) des2 <- Design$new( covariance = cov2, mean.function = mf1, var_par = 1 ) test4 <- des$analysis(type="sim", iter=100, sim_design = des2, par=6, parallel = FALSE, verbose = FALSE, method = "mcem", m = 100) ## End(Not run) ## ------------------------------------------------ ## Method `Design$power` ## ------------------------------------------------ df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = gaussian() ) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)) + (1|gr(cl*t)), parameters = c(0.25,0.1) ) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1 ) des$power(6,0.5) #power of 0.79 ## ------------------------------------------------ ## Method `Design$subset_rows` ## ------------------------------------------------ #generate a stepped wedge design and remove the first sequence des <- stepped_wedge(8,10,icc=0.05) ids_to_keep <- which(des$mean_function$data$J!=1) des$subset_rows(ids_to_keep) ## ------------------------------------------------ ## Method `Design$subset_cols` ## ------------------------------------------------ #generate a stepped wedge design and remove first and last time periods des <- stepped_wedge(8,10,icc=0.05) des$subset_cols(c(2:8,10)) ## ------------------------------------------------ ## Method `Design$plot` ## ------------------------------------------------ ## Not run: df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) des$plot(x="cl",y="t",treat="int") ## End(Not run) ## ------------------------------------------------ ## Method `Design$sim_data` ## ------------------------------------------------ df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) ysim <- des$sim_data() ## ------------------------------------------------ ## Method `Design$check` ## ------------------------------------------------ df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) des$check() #does nothing des$covariance$parameters <- c(0.1,0.9) des$check() #updates des$mean_function$parameters <- c(rnorm(5),0.1) des$check() #updates ## ------------------------------------------------ ## Method `Design$MCML` ## ------------------------------------------------ ## Not run: df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) ysim <- des$sim_data() # fits the models using MCEM but does not estimate standard errors fit1 <- des$MCML(y = ysim, se.method = "none") #fits the models using MCNR but does not estimate standard errors fit2 <- des$MCML(y = ysim, se.method = "none", method="mcnr") #fits the models and uses permutation tests for parameter of interest fit3 <- des$MCML(y = ysim, se.method = "perm", permutation.par = 6, options = list( perm_type="unw", perm_iter=1000, perm_parallel=FALSE, perm_ci_steps=1000 )) #adds a simulated likelihood step after the MCEM algorithm fit4 <- des$MCML(y = des$sim_data(), se.method = "none", options = list( sim_lik_step=TRUE )) ## End(Not run) ## ------------------------------------------------ ## Method `Design$dfbeta` ## ------------------------------------------------ df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) ysim <- des$sim_data() des$dfbeta(ysim,6) ## ------------------------------------------------ ## Method `Design$permutation_test` ## ------------------------------------------------ ## Not run: df <- nelder(~(cl(6)*t(5)) > ind(5)) df$int <- 0 df[df$cl > 3, 'int'] <- 1 #generate function that produces random allocations treatf <- function(){ tr <- sample(rep(c(0,1),each=3),6,replace = FALSE) rep(tr,each=25) } mf1 <- MeanFunction$new( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family =gaussian(), treat_var = "int", random_function = treatf) cov1 <- Covariance$new( data = df, formula = ~ (1|gr(cl)), parameters = c(0.25)) des <- Design$new( covariance = cov1, mean.function = mf1, var_par = 1) #run MCML to get parameter estimate: fit1 <- des$MCML(y = ysim, se.method = "none") perm1 <- des$permutation_test( y=ysim, permutation.par=6, start = fit1$coefficients$est[6], type="unw", iter = 1000, nsteps = 1000) ## End(Not run) ## ------------------------------------------------ ## Method `Design$MCMC` ## ------------------------------------------------ ## Not run: df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) ysim <- des$sim_data() # fits the model with default priors fit1 <- des$MCMC(y=ysim) #specify priors fit2 <- des$MCMC(y=ysim, priors = list( prior_b_mean = rep(0,6), prior_b_sd = c(rep(3,5),1), prior_g_sd = rep(1,2))) ## End(Not run) ## ------------------------------------------------ ## Method `Design$analysis_bayesian` ## ------------------------------------------------ ## Not run: #' df <- nelder(~(cl(10)*t(5)) > ind(10)) df$int <- 0 df[df$cl > 5, 'int'] <- 1 des <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)*ar1(t)), parameters = c(0.25,0.8)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) test <- des$analysis_bayesian(iter=10, par=6, warmup_iter = 200, sampling_iter = 500, parallel = FALSE, verbose=FALSE) #to examine misspecification we can sample from a different model des2 <- Design$new( covariance = list( data = df, formula = ~ (1|gr(cl)), parameters = c(0.25)), mean.function = list( formula = ~ factor(t) + int - 1, data=df, parameters = c(rep(0,5),0.6), family = binomial()) ) test2 <- des$analysis_bayesian(iter=10, par=6, sim_design=des2, warmup_iter = 200, sampling_iter = 500, parallel = FALSE, verbose=FALSE) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.