Determine the probability of correct classification (PCC) for a high dimensional classification study employing Cross validation classifier. In contrast to the cv_method this function also generates a test dataset so that the estimated PCC does not rely on the normal approximation for the PCC formula.

cv_method_MC(mu0, p, m, n, alpha_list, nrep, p1 = 0.5, ss = F, ntest,
sampling.p=0.5)
`mu0` |
The effect size of the important features. |

`p` |
The number of the features in total. |

`m` |
The number of the important features. |

`n` |
The total sample size for the two groups, that would be used to develop the classifier. |

`alpha_list` |
The search grid for the p-value threshold. The examples below use only three values for the sake of giving examples that run quickly but this should ideally be a dense grid, |

`nrep` |
The number of simulation replicates employed to compute the expected PCC and/or sensitivity and specificity. |

`p1` |
The prevalence of the group 1 in the population, default to 0.5. |

`ss` |
Boolean variable, default to FALSE. The TRUE value instruct the program to compute the sensitivity and the specificity of the classifier. |

`ntest` |
Sample size for the test dataset. |

`sampling.p` |
The assumed proportion of group 1 samples in the training data; default of 0.5 assumes groups are equally represented regardless of p1. |

Refer to Sanchez, Wu, Song, Wang 2016, Section 2.2. This function was used to verify that a given sample size achieves the target PCC in Table 1 of the manuscript.

If ss=FALSE, the function returns the expected PCC. If ss=TRUE, the function returns a vector containing the expected PCC, sensitivity and specificity.

Meihua Wu <[email protected]> Brisa N. Sanchez <[email protected]> Peter X.K. Song <[email protected]> Raymond Luu <[email protected]> Wen Wang <[email protected]>

Sanchez, B.N., Wu, M., Song, P.X.K., and Wang W. (2016). "Study design in high-dimensional classification analysis." Biostatistics, in press.

set.seed(1)
cv_method_MC(mu0=0.4,p=500,m=10,n=80,alpha_list=c(0.0000001,0.0001,0.01),
nrep=10,p1=0.6,ss=TRUE,ntest=100)
#return: 0.818 0.882 0.754
#alpha_list should be a dense list of p-value cutoffs;
#here we only use a few values to ease computation of the example.
