gsym.point: Construction of confidence intervals for the Generalized...

View source: R/gsym.point.R

gsym.pointR Documentation

Construction of confidence intervals for the Generalized Symmetry point and its accuracy measures through two methods

Description

gsym.point is used to construct confidence intervals for the Generalized Symmetry point and its accuracy measures (sensitivity and specificity) for a continuous diagnostic test using two methods: the Generalized Pivotal Quantity (GPQ) method and the Empirical Likelihood (EL) method.

Usage

gsym.point (methods, data, marker, status, tag.healthy, categorical.cov = NULL, 
CFN = 1, CFP = 1, control = control.gsym.point(), confidence.level = 0.95, 
trace = FALSE, seed = FALSE, value.seed = 3, verbose = FALSE)

Arguments

methods

a character vector selecting the method(s) to be used for estimating the Generalized Symmetry point and its accuracy measures. The possible options are: "GPQ", "EL", "auto", c("GPQ","EL") or c("EL","GPQ").

data

a data frame containing all needed variables: the diagnostic marker, the true disease status and, when it is neccesary, the categorical covariate.

marker

a character string with the name of the diagnostic test variable.

status

a character string with the name of the variable that distinguishes healthy from diseased individuals.

tag.healthy

the value codifying healthy individuals in the status variable.

categorical.cov

a character string with the name of the categorical covariate according to which the Generalized Symmetry point is to be calculated. The default is NULL (no categorical covariate is considered in the analysis).

CFN

a numerical value that specifies the cost of a false negative decision. The default value is 1.

CFP

a numerical value that specifies the cost of a false positive decision. The default value is 1.

control

output of the control.gsym.point function that controls the whole calculation process of the Generalized Symmetry point.

confidence.level

a numerical value with the confidence level for the construction of the confidence intervals. The default value is 0.95.

trace

a logical value to show information on progress when it is TRUE. The default value is FALSE.

seed

a logical value to choose if a seed is fixed for generating the trials in the computation of the confidence intervals in order to reproduce the same simulation process. The default value is FALSE.

value.seed

the numerical value for the fixed seed when seed is TRUE. The default value is 3.

verbose

a logical value that allows to show extra information on the normality assumption and the Shapiro-Wilk normality p-values. The default value is FALSE.

Details

The Symmetry point c_{S} satisfies the equality p(c_{S}) = q(c_{S}), where p and q denote, respectively, the specificity (or true negative fraction) and sensitivity (or true positive fraction). Geometrically, it is the point where the ROC curve and the line y = 1 - x (the perpendicular to the positive diagonal line) intersect, and it can also be seen as the point that maximizes simultaneously both types of correct classifications (Riddle and Stratford, 1999; Gallop et al., 2003) corresponding, therefore, to the probability of correctly classifying any subject, whether it is healthy or diseased (Jiménez-Valverde et al., 2012; 2014).

Taking into account the costs associated to the false positives and false negatives misclassifications, C_{FP} and C_{FN}, an extension of the Symmetry point called the Generalized Symmetry point, c_{GS}, can be defined as follows (López-Ratón et al., 2015):

\rho (1-p(c_{GS})) = 1-q(c_{GS})

where \rho = \frac{C_{FP}}{C_{FN}} is the relative loss (cost) of a false positive classification as compared with a false negative classification. Analogously to the Symmetry point, c_{GS} is obtained graphically by the intersection point between the ROC curve and the line y = 1 - \rho x.

In this package, the two methods proposed in López-Ratón et al. (2016) for estimating the Generalized Symmetry point and its sensitivity and specificity indexes are available:

"GPQ": Method based on the Generalized Pivotal Quantity (Weerahandi, 1993; 1995; Lai et al., 2012). It assumes that the diagnostic test on both groups or a monotone Box-Cox transformation is Normal distributed. So, the Generalized Symmetry point c_{GS} can be estimated from the following equation:

\Phi(a+b\Phi^{-1}(t)) = 1-\rho t \Leftrightarrow \Phi \left(\frac{\Phi^{-1}(1-\rho t)-a}{b}\right)-t=0

where a=\frac{\mu_1-\mu_0}{\sigma_1}, b=\frac{\sigma_0}{\sigma_1}, t=1-p(c_{GS}) and \Phi denotes the standard Normal cumulative distribution function (cdf), with \mu_i and \sigma_i, i = 0,1, the mean and standard deviation of healthy (i=0) and diseased (i=1) populations, respectively. To check the assumption of normality, the Shapiro-Wilk test is used with a significance level of 5%.

"EL": Method based on the Empirical Likelihood (Thomas and Grunkemeier, 1975). It takes into account that c_{GS} can be seen as two specific quantiles, the p(c_{GS})-th quantile of the healthy population and the \rho (1-q(c_{GS}))-th quantile of the diseased population. Following the same reasoning as in Molanes-López and Letón (2011), and considering that the value of p(c_{GS}) is known in advance and the Generalized Symmetry point defines an operating point on the ROC curve fulfilling 1-x=p(c_{GS}), the following adjusted empirical log-likelihood ratio function is derived to make inference on c_{GS}:

\ell(c)=2n_0\hat{F}_{0,g_{0}}(c)\log\!\left(\frac{\hat{F}_{0,g_{0}}(c)}{p(c)}\right) +2n_0(1-\hat{F}_{0,g_{0}}(c))\log\left(\frac{1-\hat{F}_{0,g_0}(c)}{1-p(c)}\right)

+2n_1\hat{F}_{1,g_{1}}(c)\log\left(\frac{\hat{F}_{1,g_{1}}(c)}{\rho(1-p(c))}\right) +2n_1(1-\hat{F}_{1,g_{1}}(c))\log\left(\frac{1-\hat{F}_{1,g_{1}}(c)}{1-\rho (1-p(c))}\right)\!,

where \hat{F}_{i,g_{i}}(y)=\frac{1}{n_i}\sum_{k_i=1}^{n_i}K\left(\frac{y-Y_{ik_i}}{g_{i}}\right) are kernel-type estimates of the cdfs F_{i}, of the two populations, i=0,1, with K(y)=\int_{-\infty}^{y} K(z)\mathrm{d}z a kernel function and g_i the smoothing parameter, for i=0,1.

"auto": the program selects automatically the most appropriate method of the two available, based on the normality assumption. The GPQ is selected under the normality assumption and the EL otherwise.

Value

Returns an object of class "gsym.point" with the following components:

methods

a character vector with the value of the methods argument used in the call.

levels.cat

a character vector indicating the levels of the categorical covariate if the categorical.cov argument in the gsym.point function is not NULL.

call

the matched call.

data

the data frame with the variables used in the call.

For each of the methods used in the call, a list with the following components is obtained:

"optimal.result"

a list with the Generalized Symmetry point and its associated sensitivity and specificity accuracy measures with the corresponding confidence intervals.

"AUC"

the numerical value of the Area Under the ROC Curve.

"rho"

the numerical value of the cost ratio \rho = \frac{C_{FP}}{C_{FN}}.

"pvalue.healthy"

the numerical value of the p-value obtained by the Shapiro-Wilk normality test for checking the normality assumption of the marker in the healthy population.

"pvalue.diseased"

the numerical value of the p-value obtained by the Shapiro-Wilk normality test for checking the normality assumption of the marker in the diseased population.

In addition, if the original data are not normally distributed the following components also appears:

"lambda"

the estimated numerical value of the power used in the Box-Cox transformation.

"normality.transformed"

a character string indicating if the transformed marker values by the Box-Cox transformation are normally distributed ("yes") or not ("no").

"pvalue.healthy.transformed"

the numerical value of the p-value obtained by the Shapiro-Wilk normality test for checking the normality assumption of the Box-Cox transformed marker in the healthy population.

"pvalue.diseased.transformed"

the numerical value of the p-value obtained by the Shapiro-Wilk normality test for checking the normality assumption of the Box-Cox transformed marker in the diseased population.

Author(s)

Mónica López-Ratón, Carmen Cadarso-Suárez, Elisa M. Molanes-López and Emilio Letón

References

Gallop, R.J., Crits-Christoph, P., Muenz, L.R. and Tu, X.M. (2003). Determination and interpretation of the optimal operating point for ROC curves derived through generalized linear models. Understanding Statistics 2, 219-242.

Jiménez-Valverde, A. (2012). Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling. Global Ecology and Biogeography 21, 498-507.

Jiménez-Valverde, A. (2014). Threshold-dependence as a desirable attribute for discrimination assessment: implications for the evaluation of species distribution models. Biodiversity Conservation 23, 369-385

Lai, C.Y., Tian, L. and Schisterman, E.F. (2012). Exact confidence interval estimation for the Youden index and its corresponding optimal cut-point. Comput. Stat. Data Anal. 56, 1103-1114.

López-Ratón, M., Cadarso-Suárez, C., Molanes-López, E.M. and Letón, E. (2016). Confidence intervals for the Symmetry point: an optimal cutpoint in continuous diagnostic tests. Pharmaceutical Statistics 15(2), 178-192.

López-Ratón, M., Molanes-López, E.M., Letón, E. and Cadarso-Suárez, C. (2017). GsymPoint: An R Package to Estimate the Generalized Symmetry Point, an Optimal Cut-off Point for Binary Classification in Continuous Diagnostic Tests. The R Journal 9(1), 262-283.

Metz, C.E. (1978). Basic Principles of ROC Analysis. Seminars in Nuclear Medicine 8, 183-298.

Molanes-López, E.M. and Letón, E. (2011). Inference of the Youden index and associated threshold using empirical likelihood for quantiles. Statistics in Medicine 30, 2467-2480.

Molanes-López, E.M., Van Keilegom, I. and Veraverbeke, N. (2009). Empirical likelihood for non-smooth criterion functions. Scandinavian Journal of Statistics 36, 413-432.

Remaley, A.T., Sampson, M.L., DeLeo, J.M., Remaley, N.A., Farsi, B.D. and Zweig, M.H. (1999). Prevalence-value-accuracy plots: a new method for comparing diagnostic tests based on misclassification costs. Clinical Chemistry 45, 934-941.

Riddle, D.L. and Stratford, P.W. (1999). Interpreting validity indexes for diagnostic tests: An illustration using the Berg Balance Test. Physical Therapy 79, 939-948.

Rutter, C.M. and Miglioretti, D.L. (2003). Estimating the accuracy of psychological scales using longitudinal data. Biostatistics 4, 97-107.

Thomas, D.R. and Grunkemeier, G.L. (1975). Confidence interval estimation of survival probabilities for censored data. Journal of the American Statistical Association 70, 865-871.

Wand, M.P. and Jones, M.C. (1995). Kernel smoothing. Chapman and Hall, London.

Weerahandi, S. (1993). Generalized confidence intervals. Journal of the American Statistical Association 88, 899-905.

Weerahandi, S. (1995). Exact statistical methods for data analysis. Springer-Verlag, New York.

Zhou, W. and Jing, B.Y. (2003). Adjusted empirical likelihood method for quantiles. Annals of the Institute of Statistical Mathematics 55, 689-703.

See Also

control.gsym.point, summary.gsym.point

Examples

library(GsymPoint)

data(melanoma)

###########################################################
# marker: X
# status: group
###########################################################

###########################################################
# Generalized Pivotal Quantity Method ("GPQ"): 
# Original data normally distributed
###########################################################

gsym.point.GPQ.melanoma<-gsym.point(methods = "GPQ", data = melanoma,
marker = "X", status = "group", tag.healthy = 0, categorical.cov = NULL, 
CFN = 1, CFP = 1, control = control.gsym.point(),confidence.level = 0.95, 
trace = FALSE, seed = FALSE, value.seed = 3, verbose = FALSE)

summary(gsym.point.GPQ.melanoma)

plot(gsym.point.GPQ.melanoma)


data(prostate)

###########################################################
# marker: marker
# status: status
###########################################################

###########################################################
# Generalized Pivotal Quantity Method ("GPQ"): 
# Box-Cox transformed data normally distributed
###########################################################

gsym.point.GPQ.prostate <- gsym.point (methods = "GPQ", data = prostate,
marker = "marker", status = "status", tag.healthy = 0, categorical.cov = NULL, 
CFN = 1, CFP = 1, control = control.gsym.point(), confidence.level = 0.95, 
trace = FALSE, seed = FALSE, value.seed = 3, verbose = FALSE)

summary(gsym.point.GPQ.prostate)

plot(gsym.point.GPQ.prostate)


data(elastase)

###########################################################
# marker: elas
# status: status
###########################################################

###########################################################
# Generalized Pivotal Quantity Method ("GPQ"):
# Original data not normally distributed 
# Box-Cox transformed data not normally distributed
###########################################################

gsym.point.GPQ.elastase <- gsym.point(methods = "GPQ", data = elastase, 
marker = "elas", status = "status", tag.healthy = 0, categorical.cov = NULL, 
CFN = 1, CFP = 1, control = control.gsym.point(), confidence.level = 0.95, 
trace = FALSE, seed = FALSE, value.seed = 3, verbose = FALSE) 

summary(gsym.point.GPQ.elastase)

plot(gsym.point.GPQ.elastase)


GsymPoint documentation built on Nov. 2, 2023, 5:59 p.m.