Simulationbased estimation of power for the twophase study design
Description
Monte Carlo based estimation of statistical power for estimators of the components of a logistic regression model, based on balanced twophase and casecontrol study designs (Breslow and Chatterjee, 1999; Prentice and Pykle, 1979).
Usage
1 2 3 
Arguments
B 
The number of datasets generated by the simulation. 
betaTruth 
Regression coefficients from the logistic regression model. 
X 
Design matrix for the logistic regression model. The first column should correspond to intercept. For each exposure, the baseline group should be coded as 0, the first level as 1, and so on. 
N 
A numeric vector providing the sample size for each row of the design matrix, 
strata 
A numeric vector indicating which columns of the design matrix, 
expandX 
Character vector indicating which columns of 
etaTerms 
Character vector indicating which columns of 
nII 
A numeric value indicating the phase II sample size. If a vector is provided, separate simulations are run for each element. 
alpha 
Type I error rate assumed for the evaluation of coverage probabilities and power. 
digits 
Integer indicating the precision to be used for the output. 
betaNames 
An optional character vector of names for the regression coefficients,

monitor 
Numeric value indicating how often 
cohort 
Logical flag. TRUE indicates phase I is drawn as a cohort; FALSE indicates phase I is drawn as a casecontrol sample. 
NI 
A pair of integers providing the outcomespecific phase I sample sizes when the phase I data are drawn as a casecontrol sample. The first element corresponds to the controls and the second to the cases. 
Details
A simulation study is performed to estimate power for various estimators
of beta
:
(a) complete data maximum likelihood (CD)
(b) casecontrol maximum likelihood (CC)
(c) twophase weighted likelihood (WL)
(d) twophase pseudo or profile likelihood (PL)
(e) twophase maximum likelihood (ML)
The overall simulation approach is the same as that described in tpsSim
.
In each case, power is estimated as the proportion of simulated datasets for which a hypothesis test of no effect is rejected.
The correspondence between betaTruth
and X
, specifically the ordering of elements, is based on successive use of factor
to each column of X
which is expanded via the expandX
argument. Each exposure that is expanded must conform to a 0, 1, 2, ... integerbased coding convention.
The etaTerms
argument is useful when only certain columns in X
are to be included in the model. In the context of the twophase design, this might be the case if phase I stratifies on some surrogate exposure and a more detailed/accurate measure is to be included in the main model.
Only balanced designs are considered by tpsPower()
. For unbalanced designs, power estimates can be obtained from tpsSim
.
NOTE: In some settings, the current implementation of the ML estimator returns point estimates that do not satisfy the phase I and/or phase II constraints. If this is the case a warning is printed and the "fail" elements of the returned list is set to TRUE. An example of this is phenomenon is given the help file for tps
. When this occurs, tpsPower()
considers ML estimation for the particular dataset to have failed.
Value
tpsPower()
returns an object of class "tpsPower", a list containing all the input arguments, as well as the following components:
betaPower 
Power against the null hypothesis that the regression coefficient is zero for a Waldbased test with an 
failed 
A vector consisting of the number of datasets excluded from the power calculations (i.e. set to 
Note
A generic print method provides formatted output of the results.
A generic plot function plotPower
provides plots of powers against different sample sizes for each estimate of a regression coefficient.
Author(s)
Sebastien Haneuse, Takumi Saegusa
References
Prentice, R. and Pyke, R. (1979) "Logistic disease incidence models and casecontrol studies." Biometrika 66:403411.
Breslow, N. and Chatterjee, N. (1999) "Design and analysis of two phase studies with binary outcome applied to Wilms tumour prognosis." Applied Statistics 48:457468.
Haneuse, S. and Saegusa, T. and Lumley, T. (2011) "osDesign: An R Package for the Analysis, Evaluation, and Design of TwoPhase and CaseControl Studies." Journal of Statistical Software, 43(11), 129.
See Also
plotPower
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34  ##
data(Ohio)
##
XM < cbind(Int=1, Ohio[,1:3])
fitM < glm(cbind(Death, NDeath) ~ factor(Age) + Sex + Race, data=Ohio,
family=binomial)
betaNamesM < c("Int", "Age1", "Age2", "Sex", "Race")
## Power for the TPS design where phase I stratification is based on Race.
##
## Not run:
tpsResult1 < tpsPower(B=1000, beta=fitM$coef, X=XM, N=Ohio$N, strata=4,
nII=seq(from=100, to=1000, by=100),
betaNames=betaNamesM, monitor=100)
tpsResult1
## End(Not run)
## Power for the TPS design where phase I stratification is based on Age
## * consider the setting where the age coefficients are halved from
## their observed true values
## * the intercept is modified, accordingly, using the beta0() function
##
newBetaM < fitM$coef
newBetaM[2:3] < newBetaM[2:3] / 2
newBetaM[1] < beta0(betaX=newBetaM[1], X=XM, N=Ohio$N,
rhoY=sum(Ohio$Death)/sum(Ohio$N))
##
## Not run:
tpsResult2 < tpsPower(B=1000, beta=fitM$coef, X=XM, N=Ohio$N, strata=2,
nII=seq(from=100, to=500, by=50),
betaNames=betaNamesM, monitor=100)
tpsResult2
## End(Not run)
