Description Usage Arguments Details Value References Examples
View source: R/SAT.stage1.sampling.r
This function implements the stage 1 subsampling of SAT by SGS method.
1 | SAT.stage1.sampling(r1, n, S, Rpar = 0.5)
|
r1 |
pilot subsample size. |
n |
total sample size. |
S |
a binary vector of length n. Surrogate observations for all samples. |
Rpar |
case proportion parameter. The recommended range is (0.3, 0.6), and default is 0.5. |
The region of Rpar that corresponds to lower MSEs is (0.3, 0.5) for case prevalence (i.e., P(Y=1|X)) around 4%. To avoid failures in the estimation when the case prevalence is low and r1 is small, a slightly larger Rpar in (0.5, 0.6) can be used without compromising the performance of SAT. Using Rpar=0.5 is a safe choice for most situations.
The function returns a vector of patient index for whom the manual chart reviews are going to be collected.
Liu, X., Chubak, J., Hubbard, R. A. & Chen, Y. (2021). SAT: a Surrogate Assisted Two-wave case boosting sampling method, with application to EHR-based association studies.
1 2 3 4 5 6 7 8 9 10 11 | library(SAT)
set.seed(0)
colnames(lung_cancer)
X <- cbind(1, lung_cancer[,3:5])
Y <- lung_cancer[,1]
S <- lung_cancer[,2]
# pilot sampling
stage1.index <- SAT.stage1.sampling(r1 = 400, n = 1e5, S, Rpar = 0.5)
head(stage1.index)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.