qgcomp.hurdle.boot: Quantile g-computation for hurdle count outcomes

Description Usage Arguments Details Value See Also Examples

View source: R/base_hurdle.R

Description

This function estimates a linear dose-response parameter representing a one quantile increase in a set of exposures of interest for hurdle count outcomes. This function is limited to linear and additive effects of individual components of the exposure. This model estimates the parameters of a marginal structural hurdle count model (MSM) based on g-computation with quantized exposures. Note: this function allows linear and non-additive effects of individual components of the exposure, as well as non-linear joint effects of the mixture via polynomial basis functions, which increase the computational computational burden due to the need for non-parametric bootstrapping.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
qgcomp.hurdle.boot(
  f,
  data,
  expnms = NULL,
  q = 4,
  breaks = NULL,
  id = NULL,
  weights,
  alpha = 0.05,
  B = 200,
  degree = 1,
  seed = NULL,
  bayes = FALSE,
  parallel = FALSE,
  MCsize = 10000,
  msmcontrol = hurdlemsm.fit.control(),
  parplan = FALSE,
  ...
)

Arguments

f

R style formula

data

data frame

expnms

character vector of exposures of interest

q

NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation)

breaks

(optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints.

id

(optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster)

weights

"case weights" - passed to the "weight" argument of hurdle. NOTE - this does not work with parallel=TRUE!

alpha

alpha level for confidence limit calculation

B

integer: number of bootstrap iterations (this should typically be >=200, though it is set lower in examples to improve run-time).

degree

polynomial basis function for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic.

seed

integer or NULL: random number seed for replicable bootstrap results

bayes

not currently implemented.

parallel

use (safe) parallel processing from the future and future.apply packages

MCsize

integer: sample size for simulation to approximate marginal zero inflated model parameters. This can be left small for testing, but should be as large as needed to reduce simulation error to an acceptable magnitude (can compare psi coefficients for linear fits with qgcomp.hurdle.noboot to gain some intuition for the level of expected simulation error at a given value of MCsize)

msmcontrol

named list from hurdlemsm.fit.control

parplan

(logical, default=FALSE) automatically set future::plan to plan(multisession) (and set to existing plan, if any, after bootstrapping)

...

arguments to glm (e.g. family)

Details

Hurdle count models allow excess zeros in standard count outcome (e.g. Poisson distributed outcomes). Such models have two components: 1) the probability of arising from a degenerate distribution at zero (versus arising from a count distribution) and 2) the rate parameter of a (possibly truncated > 0) count distribution. Thus, one has the option of allowing exposure and covariate effects on the zero distribution, the count distribution, or both. The zero distribution parameters correspond to log-odds ratios for the probability of arising from the zero distribution. Count distribution parameters correspond to log-rate-ratio parameters. Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities.

Of note, this function yields marginal estimates of the expected outcome under values of the joint exposure quantiles (e.g. the expected outcome if all exposures are below the 1st quartile). These outcomes can be used to derive estimates of the effect on the marginal expectation of the outcome, irrespective of zero/count portions of the statistical model.

Estimates correspond to the average expected change in the (log) outcome per quantile increase in the joint exposure to all exposures in ‘expnms’. Test statistics and confidence intervals are based on a non-parametric bootstrap, using the standard deviation of the bootstrap estimates to estimate the standard error. The bootstrap standard error is then used to estimate Wald-type confidence intervals. Note that no bootstrapping is done on estimated quantiles of exposure, so these are treated as fixed quantities

Value

a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the marginal structural model (msmfit) used to estimate the final effect estimates.

See Also

qgcomp.hurdle.noboot,qgcomp.boot, qgcomp.cox.boot, and hurdle

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
set.seed(50)
n=500
dat <- data.frame(y=rbinom(n, 1, 0.5)*rpois(n, 1.2), x1=runif(n), x2=runif(n), z=runif(n))
# poisson count model, mixture in both portions
## Not run: 
# warning: the examples below can take a long time to run
res = qgcomp.hurdle.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), 
    data=dat, q=4, dist="poisson", B=1000, MCsize=10000, parallel=TRUE, parplan=TRUE)
qgcomp.hurdle.noboot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), 
    data=dat, q=4, dist="poisson")
res

# accuracy for small MCsize is suspect (compare coefficients between boot/noboot versions), 
# so re-check with MCsize set to larger value (this takes a long time to run)
res2 = qgcomp.hurdle.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), 
    data=dat, q=4, dist="poisson", B=1000, MCsize=50000, parallel=TRUE, parplan=TRUE)
 res2
plot(density(res2$bootsamps[4,]))

# negative binomial count model, mixture and covariate in both portions
qgcomp.hurdle.boot(f=y ~ z + x1 + x2 | z + x1 + x2, expnms = c('x1', 'x2'), 
   data=dat, q=4, dist="negbin", B=10, MCsize=10000) 
   
# weighted analysis (NOTE THIS DOES NOT WORK WITH parallel=TRUE!)
dat$w = runif(n)*5
qgcomp.hurdle.noboot(f=y ~ z + x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), 
    data=dat, q=4, dist="poisson", weights=w)
# You may see this:     
# Warning message:
# In eval(family$initialize) : non-integer #successes in a binomial glm!
qgcomp.hurdle.boot(f=y ~ x1 + x2 | x1 + x2, expnms = c('x1', 'x2'), 
    data=dat, q=4, dist="poisson", B=5, MCsize=50000, parallel=FALSE, weights=w)
# Log rr per one IQR change in all exposures (not on quantile basis)
dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75))))
dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75))))
# note that I(x>...) now operates on the untransformed value of x,
# rather than the quantized value
res2 = qgcomp.hurdle.boot(f=y ~ z + x1iqr + x2iqr + I(x2iqr>0.1) + 
                             I(x2iqr>0.4) + I(x2iqr>0.9) | x1iqr + x2iqr, 
                   expnms = c('x1iqr', 'x2iqr'), 
                   data=dat, q=NULL, B=2, degree=2, MCsize=2000, dist="poisson")
res2

## End(Not run)

qgcomp documentation built on Jan. 24, 2022, 5:08 p.m.