ssize.pcc: Sample Size Planning for Developing Classifiers Using High...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/ssizePCC.R

Description

Calculate sample size for training set in developing classifiers using high dimensional data. The calculation is based on the probability of correct classification (PCC).

Usage

1
ssize.pcc(gamma, stdFC, prev = 0.5, nrFeatures, sigFeatures = 20, verbose = FALSE)

Arguments

gamma

tolerance between PCC(infty) and PCC(n).

stdFC

expected standardized fold-change; that is, expected fold-change devided by within class standard deviation.

prev

expected prevalence.

nrFeatures

number of features (variables) considered.

sigFeatures

number of significatn features; default (20) should be sufficient for most if not all cases.

verbose

print intermediate results.

Details

The computations are based the algorithm provided in Section~4.2 of Dobbin and Simon (2007). Prevalence is incorporated by the simple rough approach given in Section~4.4 (ibid.).

The results for prevalence equal to $50%$ are identical to the numbers computed by https://brb.nci.nih.gov/brb/samplesize/samplesize4GE.html. For other prevalences the numbers differ and are larger for our implementation.

Value

Object of class "power.htest", a list of the arguments (including the computed one) augmented with method and note elements.

Note

optimize is used to solve equation (4.3) of Dobbin and Simon (2007), so you may see errors from it.

Author(s)

Matthias Kohl Matthias.Kohl@stamats.de

References

K. Dobbin and R. Simon (2007). Sample size planning for developing classifiers using high-dimensional DNA microarray data. Biostatistics, 8(1):101-117.

K. Dobbin, Y. Zhao, R. Simon (2008). How Large a Training Set is Needed to Develop a Classifier for Microarray Data? Clin Cancer Res., 14(1):108-114.

See Also

optimize

Examples

1
2
3
4
5
6
7
8
9
## see Table 2 of Dobbin et al. (2008)
g <- 0.1
fc <- 1.6
ssize.pcc(gamma = g, stdFC = fc, nrFeatures = 22000)

## see Table 3 of Dobbin et al. (2008)
g <- 0.05
fc <- 1.1
ssize.pcc(gamma = g, stdFC = fc, nrFeatures = 22000)

Example output

     Sample Size Planning for Developing Classifiers Using High Dimensional Data 

          gamma = 0.1
           prev = 0.5
     nrFeatures = 22000
             n1 = 21
             n2 = 21

NOTE: n1 is number of cases, n2 is number of controls


     Sample Size Planning for Developing Classifiers Using High Dimensional Data 

          gamma = 0.05
           prev = 0.5
     nrFeatures = 22000
             n1 = 47
             n2 = 47

NOTE: n1 is number of cases, n2 is number of controls

MKmisc documentation built on Aug. 8, 2021, 5:06 p.m.