Introduction

The PF package is a collection of functions related to estimating prevented fraction, $PF=1-RR$ , where $RR={{{\pi}{2}}}/{{{\pi}{1}}}\;$ .

Technical notes

Optimization. Unless otherwise stated, optimization is by the DUD algorithm [@RJ78].

Level tested. The help files indicate the level of testing undergone by each function. In some cases that is a subjective judgement, since most of these functions were originally tested in SPlus and have been ported to R more recently.

Score based methods

The score statistic

Confidence intervals for the risk ratio may be based on the score statistic [@Koop84, @MN85], [\frac{{{{\hat{\pi }}}{2}}-{{\rho }{0}}{{{\hat{\pi }}}{1}}}{\sqrt{\left( {{{{\tilde{\pi }}}{2}}(1-{{{\tilde{\pi }}}{2}})}/{{{n}{2}}}\; \right)+\rho {0}^{2}\left( {{{{\tilde{\pi }}}{1}}(1-{{{\tilde{\pi }}}{1}})}/{{{n}{1}}}\; \right)}}] where hat indicates the MLE and tilde indicates the MLE under the restriction that $\rho ={{\rho }_{0}}$.

Asymptotic intervals

RRsc() estimates asymptotic confidence intervals for the risk ratio or prevented fraction based on the score statistic. Interval estimates are returned for three estimators. The score method was originally introduced by [@Koop84]. Gart and Nam's modification includes a skewness correction [@GN88]. The method of [@MN85] is a version made slightly more conservative than Koopman's by including a factor of ${(N-1)}/{N}\;$.

require(PF)
RRsc(c(4, 24, 12, 28))

Starting estimates for the algorithm are obtained by the modified Katz method (log method with 0.5 added to each cell). Both forms of the Katz estimate may be retrieved from the returned object using RRsc()\$estimate.

Exact intervals

These methods give intervals that are `exact' in the sense that they are based on the actual sampling distribution rather than an approximation to it. The score statistic is used to select the $2 \times 2$ tables that would fall in the tail area, and the binomial probability is estimated over the tail area by taking the maximum over the nuisance parameter. The search over the tail area is made more efficient by the Berger-Boos correction [@BB94].

RRtosst() gives intervals obtained by inverting two one-sided score tests; RRotsst() gives intervals obtained by inverting one two-sided score test. RRtosst() is thus more conservative, preserving at least ${\alpha }/{2}\;$ in each tail. [@AM01] discuss the properties and relative benefits of the two approaches. The price of exactnesss is conservatism, due to the discreteness of the binomial distribution [@AA01]. This means that the actual coverage of the confidence interval does not exactly conform to the nominal coverage, but it will not be less than it. (See also [@AA03].) Both functions use a simple step search algorithm.

RRotsst(c(4, 24, 12, 28))
RRtosst(c(4, 24, 12, 28))

Stratified designs

Methods for estimating a common RR from stratified or clustered designs depend on homogeneity with respect to the common parameter.

Gart-Nam score method

[@Gart85] and [@GN88] derived a score statistic for a common estimate of RR from designs with multiple independent strata, and they showed that it is identical to one proposed by [@Rad65] from a different perspective.

RRstr() provides confidence intervals and a homogeneity test based on Gart's statistic.

Data may be input two ways, either using a formula and data frame, or as a matrix.

RRstr(cbind(y, n) ~ tx + cluster(clus), Table6,
      compare = c("a", "b"), pf = FALSE)
# Data matrix input: RRstr(Y = table6, pf = F)

Mantel-Haenszel estimator

A widely-used heuristic method for sparse frequency tables is the weighted average approach of [@MH59]^[[@KLK88] review the Mantel-Haenzel approach and point out its relationship to a method proposed by [@Coc54], which was the basis of Rhadakrishnan's method [@Rad65], alluded to in Section 3.1.]. MH interval estimates are based on the asymptotic normality of the log of the risk ratio. RRmh() utilizes the variance estimator given by [@GR85] for sparse strata. The resulting asymptotic estimator is consistent for both the case of sparse strata where the number of strata is assumed increasing, and the case of limited number of strata where the stratum size is assumed increasing. In the latter case, however, it is less efficient than maximum likelihood [@AH00, @GR85]. Additional discussion may be found in Section 4.3.1 [@Lachin00], [@LSKK05], and [@SO06] ^[SAS Proc FREQ provides MH interval estimates of RR. The other estimator calculated by Proc FREQ, which it calls ``logit,'' is actually a weighted least squares estimator [@Lachin00] that has a demonstrable and severe bias for sparse data [@GR85]. It should be avoided.].

RRmh(cbind(y, n) ~ tx + cluster(clus), Table6,
     compare = c("a", "b"), pf = FALSE)
# Data matrix input: RRmh(Y = table6, pf = F)

Examples

For a fuller set of examples, see the vignette Examples for Stratified Designs.

Model based intervals

Logistic regression estimates

Intervals may be estimated from logistic regression models with RRor(). It takes the fit of a glm() object and estimates the intervals by the delta method.

bird.fit <- glm(cbind(y, n - y) ~ tx - 1, binomial, bird)
RRor(bird.fit)

Estimating the dispersion parameter

The binomial GLM weights are [\frac{\hat{\pi }(1-\hat{\pi })}{a(\hat{\varphi })/n\;}] where $a(\hat{\varphi })$ is a function of the dispersion parameter.

Dispersion parameter $\varphi$

A simple estimator of the dispersion parameter, $\varphi$, may be estimated by the method of moments [Wed74]. It is given by phiWt(). This form of the dispersion parameter has $a(\varphi)=\varphi$, and $\varphi$ is estimated by ${X^2}/{df}\;$, the Pearson statistic divided by the degrees of freedom.

Note that $\varphi$ is the same estimator as may be obtained by the quasibinomial family in glm() which is, in fact, what is used by phiWt() to reweight the original fit:

phiWt(bird.fit, fit.only = FALSE)$phi
summary(update(bird.fit, family = quasibinomial))$disp

phiWt() makes it easy to estimate PF intervals with a single command.

# model weighted by phi hat
RRor(phiWt(bird.fit))

It also allows different estimates of ${\hat{\varphi}}$ for specified subsets of the data.

# model with separate phi for vaccinates and controls
RRor(phiWt(bird.fit, subset.factor = bird$tx))

If you want to subtract a degree of freedom for each additional parameter, you can do that by entering the degrees of freedom as an argument to RRor().

# subtract 2 degrees of freedom
RRor(phiWt(bird.fit, subset.factor = bird$tx), degf = 2)

Dispersion parameter $\tau$

When overdispersion is due to intra-cluster correlation, it may make sense to estimate the dispersion as a function of the intra-cluster correlation parameter $\tau$. In other words, ${a({\varphi }{ij})}=1+{{\tau }{j}}({{n}_{ij}}-1)$. tauWt() does this using the Williams procedure [@Wil82].

# model weighted using tau hat
RRor(tauWt(bird.fit, subset.factor = bird$tx))

In this example the tauWt() estimates are the same as the phiWt() estimates. That is because the cluster sizes are all the same. Let's see what happens if we modify the bird data set. The birdm data set has the same cluster fractions but differing cluster sizes.

# different cluster sizes, same cluster fractions
birdm.fit <- glm(cbind(y, n - y) ~ tx - 1, binomial, birdm)
RRor(tauWt(birdm.fit, subset.factor = birdm$tx))

Note that increasing cluster size can make things worse when there is intra-cluster correlation.

Now let's compare the weights from phiWt() and tauWt() with unequal cluster sizes. In the output below, w represents $1/a(\hat{\varphi })\;$ and nw is $n/a(\hat{\varphi })\;$

# Compare phi and tau weights
#
phi.wts <- phiWt(birdm.fit, fit.only = FALSE, subset.factor = birdm$tx)$weights
tau.wts <- tauWt(birdm.fit, fit.only = FALSE, subset.factor = birdm$tx)$weights
w <- cbind(w.phi  = phi.wts,
           w.tau  = tau.wts,
           nw.phi = phi.wts * birdm$n,
           nw.tau = tau.wts * birdm$n)
print(cbind(birdm[, c(3, 1, 2)], round(w, 2)), row.names = FALSE)

Look at the last two rows. Note that the nw.phi are directly proportional to n within treatment group, while the nw.tau are not. With intra-cluster correlation, increasing cluster size does not give a corresponding increase in information.

Rao-Scott weights

[@RS92] give a method of weighting clustered binomial observations based on the variance inflation due to clustering. They relate their approach to the concepts of design effect and effective sample size familiar in survey sampling, and they illustrate its use in a variety of contexts. rsbWt() implements it in the same manner as phiWt() and tauWt(). For more general use, the function rsb() just returns the design effect estimates and the weights.

# model weighted with Rao-Scott weights
RRor(rsbWt(birdm.fit, subset.factor = birdm$tx))
# just the design effect estimates
rsb(birdm$y, birdm$n)$d

Likelihood based intervals

The RRlsi() function estimates likelihood support intervals for RR by the profile likelihood Section 7.6 [@Roy97].

Likelihood support intervals are usually formed based on the desired likelihood ratio, ${1}/{k}\;$, often ${1}/{8}\;$ or ${1}/{32}\;$. Under some conditions the log likelihood ratio may follow the chi-square distribution. If so, then $\alpha =1-{{F}_{{{\chi }^{2}}}}\left( 2\log (k),1 \right)$. RRsc() will make the conversion from $\alpha$ to $k$ with the argument use.alpha = T.

RRlsi(c(4, 24, 12, 28))
RRlsi(c(4, 24, 12, 28), use.alpha = TRUE)

Incidence ratio

The incidence is the number of cases per subject-time. Its distribution is assumed Poisson. Under certain designs, the incidence ratio ($IR$) is used as a measure of treatment effect. Correspondingly, ${{PF}_{IR}}=1-IR$ would be used as a measure of effect for an intervention that is preventive, such as vaccination. $IR$ is also called incidence density ratio ($IDR$), and that is the term used in the following functions.

Score method

IDRsc() estimates a confidence interval for the incidence density ratio using Siev's formula Appendix 1 [@Siev94] based on the Poisson score statistic^[This formula was published in a Japanese journal [@Sato88] several years before Siev. See also [@GMM03] and [@Siev04].]. [IDR=\widehat{IDR}\left{ 1+\left( \frac{1}{{{y}{1}}}+\frac{1}{{{y}{2}}} \right)\frac{z_{\alpha /2}^{2}}{2}\ \ \pm \ \ \frac{z_{\alpha /2}^{2}}{2{{y}{1}}{{y}{2}}}\sqrt{{{y}{\bullet }}\left( {{y}{\bullet }}z_{\alpha /2}^{2}+4{{y}{1}}{{y}{2}} \right)} \right}]

IDRsc(c(26, 204, 10, 205), pf = FALSE)

Likelihood method

A likelihood support interval for $IDR$ may be estimated based on orthogonal factoring of the reparameterized likelihood. Section 7.2 [@Roy97] IDRlsi() implements this method.

Likelihood support intervals are usually formed based on the desired likelihood ratio, ${1}/{k}\;$, often ${1}/{8}\;$ or ${1}/{32}\;$. Under some conditions the log likelihood ratio may follow the chi square distribution. If so, then $\alpha =1-{{F}_{{{\chi }^{2}}}}\left( 2\log (k),1 \right)$. IDRlsi() will make the conversion from $\alpha$ tp $k$ with the argument use.alpha = T.

IDRlsi(c(26, 204, 10, 205), pf = FALSE)
IDRlsi(c(26, 204, 10, 205), pf = FALSE, use.alpha = TRUE)

References



ABS-dev/PF documentation built on Sept. 19, 2024, 10:31 a.m.