Controlling the optimal-cutpoint selection process

Description

Used to set various parameters controlling the optimal-cutpoint selection process

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
control.cutpoints(costs.ratio = 1, CFP = 1, CFN = 1,
  valueSp = 0.85, valueSe = 0.85, 
  maxSp = TRUE,
  generalized.Youden = FALSE,
  costs.benefits.Youden = FALSE,
  costs.benefits.Efficiency = FALSE,
  weighted.Kappa = FALSE,
  standard.deviation.accuracy = FALSE,
  valueNPV = 0.85, valuePPV = 0.85,
  maxNPV = TRUE,
  valueDLR.Positive = 2,
  valueDLR.Negative = 0.5,
  adjusted.pvalue = c("PADJMS","PALT5","PALT10"),
  ci.SeSp = c("Exact","Quadratic","Wald","AgrestiCoull","RubinSchenker"),
  ci.PV = c("Exact","Quadratic","Wald","AgrestiCoull","RubinSchenker",
  "Transformed","NotTransformed","GartNam"),
  ci.DLR = c("Transformed","NotTransformed","GartNam"))

Arguments

costs.ratio

a numerical value meaningful only in the "CB" method. It specifies the costs ratio:

CR=\frac{C_{FP}-C_{TN}}{C_{FN}-C_{TP}}

where C_{FP}, C_{TN}, C_{FN} and C_{TP} are the costs of False Positive, True Negative, False Negative and True Positive decisions, respectively. The default value is 1.

CFP

a numerical value meaningful only in the "MCT", "Youden" and "MaxKappa" methods. It specifies the cost of a False Positive decision. The default value is 1.

CFN

a numerical value meaningful only in the "MCT", "Youden" and "MaxKappa" methods. It specifies the cost of a False Negative decision. The default value is 1.

valueSp

a numerical value meaningful only in the "MinValueSp", "ValueSp" and "MinValueSpSe" methods. It specifies the (minimum or specific) value set for Specificity. The default value is 0.85.

valueSe

a numerical value meaningful only in the "MinValueSe", "ValueSe" and "MinValueSpSe" methods. It specifies the (minimum or specific) value set for Sensitivity. The default value is 0.85.

maxSp

a logical value meaningful only in the "MinValueSpSe" method, in a case where there is more than one cutpoint fulfilling the conditions. If TRUE, those of the cutpoints which yield maximum Specificity are computed. Otherwise the cutoff that yields maximum Sensitivity is computed. The default is TRUE.

generalized.Youden

a logical value meaningful only in the "Youden" method. If TRUE, the Generalized Youden Index is computed. The default is FALSE.

costs.benefits.Youden

a logical value meaningful only in the "Youden" method. If TRUE, the optimal cutpoint based on cost-benefit methodology is computed. The default is FALSE.

costs.benefits.Efficiency

a logical value meaningful only in the "MaxEfficiency" method. If TRUE, the optimal cutpoint based on cost-benefit methodology is computed. The default is FALSE.

weighted.Kappa

a logical value meaningful only in the "MaxKappa" method. If TRUE, the Weighted Kappa Index is computed. The default is FALSE.

standard.deviation.accuracy

a logical value meaningful only in the "MaxEfficiency" method. If TRUE, standard deviation associated with accuracy (or efficiency) at the optimal cutpoint is computed. The default is FALSE.

valueNPV

a numerical value meaningful only in the "MinValueNPV", "ValueNPV" and "MinValueNPVPPV" methods. It specifies the minimum value set for Negative Predictive Value. The default value is 0.85.

valuePPV

a numerical value meaningful only in the "MinValuePPV", "ValuePPV" and "MinValueNPVPPV" methods. It specifies the minimum value set for Positive Predictive Value. The default value is 0.85.

maxNPV

a logical value meaningful only in the "MinValueNPVPPV" method, in a case where there is more than one cutpoint fulfilling the conditions. If TRUE, those of the cutpoints which yield the maximum Negative Predictive Value are computed. Otherwise the cutoff that yields the maximum Positive Predictive Value is computed. The default is TRUE.

valueDLR.Positive

a numerical value meaningful only in the "ValueDLR.Positive" method. It specifies the value set for the Positive Diagnostic Likelihood Ratio. The default value is 2.

valueDLR.Negative

a numerical value meaningful only in the "ValueDLR.Negative" method. It specifies the value set for the Negative Diagnostic Likelihood Ratio. The default value is 0.5.

adjusted.pvalue

a character string meaningful only in the "MinPvalue" method. It specifies the method for adjusting the p-value, i.e., "PADJMS" for the Miller and Siegmund method, and "PALT5", "PALT10" for the Altman method (see details). The default is "PADJMS".

ci.SeSp

a character string meaningful only when the argument ci.fit of the optimal.cutpoints function is TRUE. It indicates how the confidence interval for Sensitivity and Specificity measures is estimated. Options are "Exact" (Clopper and Pearson 1934), "Quadratic" (Fleiss 1981), "Wald" (Wald and Walfowitz 1939), "AgrestiCoull" (Agresti and Coull 1998) and "RubinSchenker" (Rubin and Schenker 1987) (see details). The default is "Exact".

ci.PV

a character string meaningful only when the argument ci.fit of the optimal.cutpoints function is TRUE. It indicates how the confidence interval for Predictive Values is estimated. Options are "Exact" (Clopper and Pearson 1934), "Quadratic" (Fleiss 1981), "Wald" (Wald and Walfowitz 1939), "AgrestiCoull" (Agresti and Coull 1998), "RubinSchenker" (Rubin and Schenker 1987), "Transformed" (Simel et al. 1991), "NotTransformed" (Koopman 1984) and "GartNam" (Gart and Nam 1988) (see details). The default is "Exact".

ci.DLR

a character string meaningful only when the argument ci.fit of the function optimal.cutpoints is TRUE. It indicates how the confidence interval for Diagnostic Likelihood Ratios is estimated. Options are "Transformed" (Simel et al. 1991), "NotTransformed" (Koopman 1984) and "GartNam" (Gart and Nam 1988)(see details). The default is "Transformed".

Details

The value yielded by this function is used as the control argument of the optimal.cutpoints() function.

Several methods for correcting the increase in type-I error associated with the "MinPvalue" criterion have been proposed. In this package, two methods for adjusting the p-value have been implemented, i.e., the Miller and Siegmund (1982) and Altman (1994) methods. The first of these ("PADJMS" option) uses the minimum observed p-value (pmin) and the proportion (ε) of sample data which is below the lowest (ε_{low}) (or above the highest, ε_{high}) cutpoint considered:

p_{acor}=φ(z)(z-\frac{1}{z})log≤ft(\frac{ε_{high}(1-ε_{low})}{(1-ε_{high})ε_{low}}\right)+4\frac{φ(z)}{z}

where z is the (1- pmin/2) quantile of the standard normal distribution and φ its corresponding density function. The second method is a simplification of the above formula, which considers specific values for ε: with ε=ε_{low} = ε_{high} = 5% ("PALT5" option): p_{alt5}=-3.13p_{min}≤ft(1+1.65ln(p_{min})\right) with ε=ε_{low} = ε_{high} = 10% ("PALT10" option): p_{alt10}=-1.63p_{min}≤ft(1+2.35ln(p_{min})\right). These approaches work well for low pmin values (0.0001<pmin<0.1) and are easy to apply.

For inference performed on Sensitivity and Specificity measures (which are proportions), some of the most common confidence intervals have been considered. If pr=x/n is the proportion to be estimated and 1-α is the confidence level, the options are as follows:

"Exact": The exact confidence interval of Clopper and Pearson (1934) based on the exact distribution of a proportion:

≤ft[\frac{x}{(n-x+1)F_{α/2,2(n-x+1),2x}+x}, \frac{(x+1)F_{α/2,2(x+1),2(n-x)}}{(n-x)+(x+1)F_{α/2,2(x+1),2(n-x)}}\right]

where F_{α/2,a,b} is the (1-α/2) quantile of a Fisher-Snedecor distribution with a and b degrees of freedom. Note that the "exact" method cannot be applied when x or n-x is equal to zero, since the quantile of the Fisher-Snedecor distribution is not defined for zero degrees of freedom. In that cases, the program returns a NaN for the limit of the confidence interval that could not be computed.

"Quadratic": Fleiss' quadratic confidence interval (Fleiss 1981). It is based on the asymptotic normality of the estimator of a proportion but adding a continuity correction. This approach is valid in a situation where x and n-x are greater than 5:

\frac{1}{n+z^{2}_{1-α/2}}≤ft[(x \mp 0.5)+\frac{z^{2}_{1-α/2}}{2} \mp z_{1-α/2}√{\frac{z^{2}_{1-α/2}}{4}+\frac{(x \mp 0.5)(n-x \mp 0.5)}{n}}\right]

where z_{1-α/2} is the (1-α/2) quantile of the standard normal distribution.

"Wald": Wald's confidence interval (Wald and Wolfowitz 1939) with a continuity correction. It is based on maximum-likelihood estimation of a proportion, and adds a continuity correction. This approach is valid where x and n-x are greater than 20:

\hat{pr} \mp z_{1-α/2}√{\frac{\hat{pr}(1-\hat{pr})}{n}}+\frac{1}{2n}

"AgrestiCoull": The confidence interval proposed by Agresti and Coull (1998). It is a score confidence interval that does not use the standard calculation for the binomial proportion:

\frac{\hat{pr}+\frac{z^{2}_{1-α/2}}{2n} \mp z_{1-α/2}√{\frac{\hat{pr}(1-\hat{pr})+\frac{ z^{2}_{1-α/2}}{4n}}{n}}} {1+\frac{ z^{2}_{1-α/2}}{n}}

"RubinSchenker": Rubin and Schenker's logit confidence interval (1987). It uses logit transformation and Bayesian arguments with an a priori Jeffreys distribution.

logit≤ft[logit≤ft(\frac{x+0.5}{n+1}\right) \mp \frac{z_{1-α/2}}{√{(n+1)≤ft(\frac{x+0.5}{n+1}\right)≤ft(1-\frac{x+0.5}{n+1}\right)}}\right]

where the logit function is logit(q)=log≤ft(\frac{q}{1-q}\right).

Since Diagnostic Likelihood Ratios represent a ratio between two probabilities, obtaining a confidence interval for them is less direct than it is for Sensitivity and Specificity. Let pr_{1}=x_{1}/n_{1} be the proportion in the numerator and pr_{2}=x_{2}/n_{2}, the proportion in the denominator. Based on the logarithmic transformation of the Likelihood Ratio ("Transformed" option), the 100(1-α)% confidence interval is (Simel et al., 1991):

exp≤ft[ln≤ft(\frac{\widehat{pr}_{1}}{\widehat{pr}_{2}}\right) \mp z_{1-α/2}√{\frac{1-\widehat{pr}_{1}}{n_{1}\widehat{pr}_{1}} +\frac{1-\widehat{pr}_{2}} {n_{2}\widehat{pr}_{2}}}\right]

These confidence intervals tend to perform better than do untransformed confidence intervals (Koopman 1984) ("NotTransformed" option) because the distribution of the Likelihood Ratios is asymmetric (Simel et al., 1991; Roldan Nofuentes and Luna del Castillo, 2007):

\frac{\widehat{pr}_{1}}{\widehat{pr}_{2}} \mp √{\frac{\widehat{pr}_{1}(1-\widehat{pr}_{1})}{n_{1}\widehat{pr}^{2}_{2}} +\frac{\widehat{pr}^{2}_{1}\widehat{pr}_{2}(1-\widehat{pr}_{2})}{n_{2}\widehat{pr}^{4}_{2}}}

Another confidence interval ("GartNam" option) is based on the calculation of the interval for the ratio between two independent proportions (Gart and Nam, 1988). The following quadratic equation must be solved:

\frac{≤ft(\widehat{pr}_{1}-\frac{pr_{1}}{pr_{2}}\widehat{pr}_{2}\right)^{2}}{\frac{\widehat{pr}_{1}(1-\widehat{pr}_{1}}{n_{1}} +\frac{≤ft(\frac{pr_{1}}{pr_{2}}\right)^{2}\widehat{pr}_{2}(1-\widehat{pr}_{2})}{n_{2}}} =z^{2}_{1-α/2}

Inference of the Predictive Values depends on the type of study, i.e., whether cross-sectional(prevalence can be estimated on the basis of the sample) or case-control. In the former case, the approaches for computing the confidence intervals of the Predictive Values are exactly the same as for the Sensitivity and Specificity measures. However, in a case control study, where prevalence is not estimated from the sample, the confidence intervals are based on the intervals of the Likelihood Ratios. Hence, once a prevalence estimator \hat{p} is computed and substituting each limit of these intervals into the expressions

≤ft(1+\frac{1-\hat{p}}{\hat{p}\widehat{DLR}^{+}}\right)^{-1}

and

≤ft(1+\frac{\hat{p}}{1-\hat{p}}\widehat{DLR}^{-}\right)^{-1}

confidence intervals for the Positive and Negative Predictive Values are obtained, where DLR+ and DLR- are the Positive and Negative Diagnostic Likelihood Ratios, respectively.

Value

A list with components for each of the possible arguments.

Author(s)

Monica Lopez-Raton and Maria Xose Rodriguez-Alvarez

References

Agresti, A. and Coull, B.A. (1998). Approximate is better than "exact" for interval estimation of binomial proportions. The American Statistician 52, 119–126.

Altman, D.G., Lausen, B., Sauerbrei, W. and Schumacher, M. (1994). Dangers of using "optimal" cutpoints in the evaluation of prognostic factors. Journal of the National Cancer Institute 86(11), 829–835.

Clopper, C. and Pearson, E.S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26, 404–413.

Fleiss, J.L. (1981). Statistical methods for rates and proportions. John Wiley & Sons, New York.

Gart, J.J. and Nam, J. (1998). Aproximate interval estimation of the ratio of binomial parameters: a review and corrections for skewness. Biometrics 44, 323–338.

Koopman PAR (1984). Confidence limits for the ratio of two binomial proportions. Biometrics 40, 513–517.

Miller, R. and Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics 38, 1011–1016.

Roldan Nofuentes, J.A. and Luna del Castillo, J.D. (2007). Comparing of the likelihood ratios of two binary diagnostic tests in paired designs. Statistics in Medicine 26, 4179–4201.

Rubin, D.B. and Schenker, N. (1987). Logit-based interval estimation for binomial data using the Jeffreys prior. Sociological Methodology 17, 131–144.

Simel, D.L., Samsa, G.P. and Matchar, D.B. (1991). Likelihood ratios with confidence: sample size estimation for diagnostic test studies. Journal of Clinical Epidemiology 44(8), 763–770.

Wald A, Wolfowitz J (1939). Confidence limits for continuous distribution functions. The Annals of Mathematical Statistics 10 105–118.

See Also

optimal.cutpoints

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
library(OptimalCutpoints)
data(elas)

###########################################################
# Youden Index Method ("Youden"): Covariate gender
###########################################################
optimal.cutpoint.Youden<-optimal.cutpoints(X = "elas", status = "status", tag.healthy = 0, 
methods = "Youden", data = elas, pop.prev = NULL, categorical.cov = 
"gender", control = control.cutpoints(), ci.fit = TRUE, conf.level = 0.95, trace = FALSE)

summary(optimal.cutpoint.Youden)

# Change the method for computing the confidence interval 
# of Sensitivity and Specificity measures
optimal.cutpoint.Youden<-optimal.cutpoints(X = "elas", status = "status", tag.healthy = 0, 
methods = "Youden", data = elas, pop.prev = NULL, categorical.cov = "gender", 
control = control.cutpoints(ci.SeSp = "AgrestiCoull"), ci.fit = TRUE, conf.level = 0.95, 
trace = FALSE)

summary(optimal.cutpoint.Youden)

# Compute the Generalized Youden Index
optimal.cutpoint.Youden<-optimal.cutpoints(X = "elas", status = "status", tag.healthy = 0, 
methods = "Youden", data = elas, pop.prev = NULL, categorical.cov = "gender", 
control = control.cutpoints(generalized.Youden = TRUE), ci.fit = TRUE, conf.level = 0.95, 
trace = FALSE)

summary(optimal.cutpoint.Youden)