# Simulation-based estimation of power for the case-control design

### Description

Monte Carlo based estimation of statistical power for maximum likelihood estimator (MLE) of the components of a logistic regression model, based on the case-control design.

### Usage

1 2 3 |

### Arguments

`B` |
The number of datasets generated by the simulation. |

`betaTruth` |
Regression coefficients from the logistic regression model. |

`X` |
Design matrix for the logistic regression model. The first column should correspond to intercept. For each exposure, the baseline group should be coded as 0, the first level as 1, and so on. |

`N` |
A numeric vector providing the sample size for each row of the design matrix, |

`expandX` |
Character vector indicating which columns of |

`etaTerms` |
Character vector indicating which columns of |

`nCC` |
A numeric value indicating the total case-control sample size. If a vector is provided, separate simulations are run for each value. |

`r` |
A numeric value indicating the control:case ratio in the case-control sample. |

`alpha` |
Type I error rate assumed for the evaluation of coverage probabilities and power. |

`digits` |
Integer indicating the precision to be used for the output. |

`betaNames` |
An optional character vector of names for the regression coefficients, |

`monitor` |
Numeric value indicating how often |

### Details

A simulation study is conducted to evaluate statistical power for the MLE of a logistic regression model, based on the case-control design. The overall simulation approach is the same as that described in `ccSim`

. Power is estimated as the proportion of simulated datasets for which a hypothesis test of no effect is rejected. Each hypothesis test is performed using the generic `glm`

function.

The correspondence between `betaTruth`

and `X`

, specifically the ordering of elements, is based on successive use of `factor`

to each column of `X`

which is expanded via the `expandX`

argument. Each exposure that is expanded must conform to a 0, 1, 2, ... integer-based coding convention.

The `etaTerms`

argument is useful when only certain columns in `X`

are to be included in the model.

A balanced case-control design is specified by setting `r`

=1; setting `r`

=2 indicates twice as many controls are sampled, relative to the number cases, from the total `nCC`

.

When evaluating operating characteristics of the MLE, some simulated datasets may result in unusually large or small estimates. Particularly, when the the case-control sample size, `nCC`

, is small. In some settings, it may be desirable to truncate the Monte Carlo sampling distribution prior to evaluating operating characteristics. The `threshold`

argument indicates the interval beyond which MLEs are ignored. The default is such that all `B`

datasets are kept.

### Value

`ccPower()`

returns an object of class "ccPower", a list containing all the input arguments, as well as the following components:

`betaPower` |
Power against the null hypothesis that the regression coefficient is zero for a Wald-based test with an |

`failed` |
A vector consisting of the number of datasets excluded from the power calculations (i.e. set to |

### Note

A generic print method provides formatted output of the results.

A generic plot function `plotPower`

provides plots of powers against different sample sizes for each estimate of a regression coefficient.

### Author(s)

Sebastien Haneuse, Takumi Saegusa

### References

Prentice, R. and Pyke, R. (1979) "Logistic disease incidence models and case-control studies." Biometrika 66:403-411.

Haneuse, S. and Saegusa, T. and Lumley, T. (2011) "osDesign: An R Package for the Analysis, Evaluation, and Design of Two-Phase and Case-Control Studies." Journal of Statistical Software, 43(11), 1-29.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | ```
##
data(Ohio)
##
XM <- cbind(Int=1, Ohio[,1:3])
fitM <- glm(cbind(Death, N-Death) ~ factor(Age) + Sex + Race, data=Ohio,
family=binomial)
betaNamesM <- c("Int", "Age1", "Age2", "Sex", "Race")
## Power for a single CC design
##
## Not run:
ccResult1 <- ccPower(B=1000, betaTruth=fitM$coef, X=XM, N=Ohio$N, r=1,
nCC=500, betaNames=betaNamesM)
ccResult1
## End(Not run)
## Power for the CC design, based on a balanced design with
## various sample sizes
##
## Not run:
ccResult2 <- ccPower(B=1000, betaTruth=fitM$coef, X=XM, N=Ohio$N, r=1,
nCC=seq(from=100, to=500, by=50), betaNames=betaNamesM)
ccResult2
## End(Not run)
## Recalculate power for the setting where the age coefficients are
## halved from their observed true values
## * the intercept is modified, accordingly, using the beta0() function
##
newBetaM <- fitM$coef
newBetaM[2:3] <- newBetaM[2:3] / 2
newBetaM[1] <- beta0(betaX=newBetaM[-1], X=XM, N=Ohio$N,
rhoY=sum(Ohio$Death)/sum(Ohio$N))
##
## Not run:
ccResult3 <- ccPower(B=1000, betaTruth=newBetaM, X=XM, N=Ohio$N,
r=1, nCC=seq(from=100, to=500, by=50),
betaNames=betaNamesM)
ccResult3
## End(Not run)
``` |