cv_method_MC: MC simulation-based method to calculate the PCC of a CV-based...

Description Usage Arguments Details Value Author(s) References Examples

Description

Determine the probability of correct classification (PCC) for a high dimensional classification study employing Cross validation classifier. In contrast to the cv_method this function also generates a test dataset so that the estimated PCC does not rely on the normal approximation for the PCC formula.

Usage

1
2
	cv_method_MC(mu0, p, m, n, alpha_list, nrep, p1 = 0.5, ss = F, ntest, 
	sampling.p=0.5)

Arguments

mu0

The effect size of the important features.

p

The number of the features in total.

m

The number of the important features.

n

The total sample size for the two groups, that would be used to develop the classifier.

alpha_list

The search grid for the p-value threshold. The examples below use only three values for the sake of giving examples that run quickly but this should ideally be a dense grid,

nrep

The number of simulation replicates employed to compute the expected PCC and/or sensitivity and specificity.

p1

The prevalence of the group 1 in the population, default to 0.5.

ss

Boolean variable, default to FALSE. The TRUE value instruct the program to compute the sensitivity and the specificity of the classifier.

ntest

Sample size for the test dataset.

sampling.p

The assumed proportion of group 1 samples in the training data; default of 0.5 assumes groups are equally represented regardless of p1.

Details

Refer to Sanchez, Wu, Song, Wang 2016, Section 2.2. This function was used to verify that a given sample size achieves the target PCC in Table 1 of the manuscript.

Value

If ss=FALSE, the function returns the expected PCC. If ss=TRUE, the function returns a vector containing the expected PCC, sensitivity and specificity.

Author(s)

Meihua Wu <meihuawu@umich.edu> Brisa N. Sanchez <brisa@umich.edu> Peter X.K. Song <pxsong@umich.edu> Raymond Luu <raluu@umich.edu> Wen Wang <wangwen@umich.edu>

References

Sanchez, B.N., Wu, M., Song, P.X.K., and Wang W. (2016). "Study design in high-dimensional classification analysis." Biostatistics, in press.

Examples

1
2
3
4
5
6
	set.seed(1)
	cv_method_MC(mu0=0.4,p=500,m=10,n=80,alpha_list=c(0.0000001,0.0001,0.01),
	nrep=10,p1=0.6,ss=TRUE,ntest=100)
#return: 0.818 0.882 0.754
#alpha_list should be a dense list of p-value cutoffs; 
#here we only use a few values to ease computation of the example.

HDDesign documentation built on May 2, 2019, 6:41 a.m.