`phaseI()`

provides the expected phase I counts, based on a pre-specified population and outcome model. If phase II sample sizes are provided, the (expected) phase II sampling probabilities are also reported.

1 2 3 |

`betaTruth` |
Regression coefficients from the logistic regression model. |

`X` |
Design matrix for the logistic regression model. The first column should correspond to intercept. For each exposure, the baseline group should be coded as 0, the first level as 1, and so on. |

`N` |
A numeric vector providing the sample size for each row of the design matrix, |

`strata` |
A numeric vector indicating which columns of the design matrix, |

`expandX` |
Character vector indicating which columns of |

`etaTerms` |
Character vector indicating which columns of |

`nII0` |
A vector of sample sizes at phase II for controls. The length must correspond to the number of unique values for phase I stratification variable. |

`nII1` |
A vector of sample sizes at phase II for cases. The length must correspond to the number of unique values phase I stratification variable. |

`cohort` |
Logical flag. TRUE indicates phase I is drawn as a cohort; FALSE indicates phase I is drawn as a case-control sample. |

`NI` |
A pair of integers providing the outcome-specific phase I sample sizes when the phase I data are drawn as a case-control sample. The first element corresponds to the controls and the second to the cases. |

`digits` |
Integer indicating the precision to be used for the reporting of the (expected) sampling probabilities |

The correspondence between `betaTruth`

and `X`

, specifically the ordering of elements, is based on successive use of `factor`

to each column of `X`

which is expanded via the `expandX`

argument. Each exposure that is expanded must conform to a 0, 1, 2, ... integer-based coding convention.

The `etaTerms`

argument is useful when only certain columns in `X`

are to be included in the model. In the context of the two-phase design, this might be the case if phase I stratifies on some surrogate exposure and a more detailed/accurate measure is to be included in the main model.

Sebastien Haneuse, Takumi Saegusa

Haneuse, S. and Saegusa, T. and Lumley, T. (2011) "osDesign: An R Package for the Analysis, Evaluation, and Design of Two-Phase and Case-Control Studies." Journal of Statistical Software, 43(11), 1-29.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ```
##
data(Ohio)
## Design matrix that forms the basis for model and phase I
## stata specification
##
XM <- cbind(Int=1, Ohio[,1:3]) ## main effects only
XI <- cbind(XM, SbyR=XM[,3]*XM[,4]) ## interaction between sex and race
## 'True' values for the underlying logistic model
##
fitM <- glm(cbind(Death, N-Death) ~ factor(Age) + Sex + Race, data=Ohio,
family=binomial)
fitI <- glm(cbind(Death, N-Death) ~ factor(Age) + Sex * Race, data=Ohio,
family=binomial)
## Stratified sampling by race
##
phaseI(betaTruth=fitM$coef, X=XM, N=Ohio$N, strata=4,
nII0=c(125, 125),
nII1=c(125, 125))
## Stratified sampling by age and sex
##
phaseI(betaTruth=fitM$coef, X=XM, N=Ohio$N, strata=c(2,3))
##
phaseI(betaTruth=fitM$coef, X=XM, N=Ohio$N, strata=c(2,3),
nII0=(30+1:6),
nII1=(40+1:6))
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.