gsym.point | R Documentation |

gsym.point is used to construct confidence intervals for the Generalized Symmetry point and its accuracy measures (sensitivity and specificity) for a continuous diagnostic test using two methods: the Generalized Pivotal Quantity (GPQ) method and the Empirical Likelihood (EL) method.

```
gsym.point (methods, data, marker, status, tag.healthy, categorical.cov = NULL,
CFN = 1, CFP = 1, control = control.gsym.point(), confidence.level = 0.95,
trace = FALSE, seed = FALSE, value.seed = 3, verbose = FALSE)
```

`methods` |
a character vector selecting the method(s) to be used for estimating the Generalized Symmetry point and its accuracy measures. The possible options are: "GPQ", "EL", "auto", c("GPQ","EL") or c("EL","GPQ"). |

`data` |
a data frame containing all needed variables: the diagnostic marker, the true disease status and, when it is neccesary, the categorical covariate. |

`marker` |
a character string with the name of the diagnostic test variable. |

`status` |
a character string with the name of the variable that distinguishes healthy from diseased individuals. |

`tag.healthy` |
the value codifying healthy individuals in the |

`categorical.cov` |
a character string with the name of the categorical covariate according to which the Generalized Symmetry point is to be calculated. The default is NULL (no categorical covariate is considered in the analysis). |

`CFN` |
a numerical value that specifies the cost of a false negative decision. The default value is 1. |

`CFP` |
a numerical value that specifies the cost of a false positive decision. The default value is 1. |

`control` |
output of the |

`confidence.level` |
a numerical value with the confidence level for the construction of the confidence intervals. The default value is 0.95. |

`trace` |
a logical value to show information on progress when it is TRUE. The default value is FALSE. |

`seed` |
a logical value to choose if a seed is fixed for generating the trials in the computation of the confidence intervals in order to reproduce the same simulation process. The default value is FALSE. |

`value.seed` |
the numerical value for the fixed seed when |

`verbose` |
a logical value that allows to show extra information on the normality assumption and the Shapiro-Wilk normality p-values. The default value is FALSE. |

The Symmetry point `c_{S}`

satisfies the equality `p(c_{S}) = q(c_{S})`

, where `p`

and `q`

denote, respectively, the specificity (or true negative fraction) and sensitivity (or true positive fraction). Geometrically, it is the point where the ROC curve and the line `y = 1 - x`

(the perpendicular to the positive diagonal line) intersect, and it can also be seen as the point that maximizes simultaneously both types of correct classifications (Riddle and Stratford, 1999; Gallop et al., 2003) corresponding, therefore, to the probability of correctly classifying any subject, whether it is healthy or diseased (Jiménez-Valverde et al., 2012; 2014).

Taking into account the costs associated to the false positives and false negatives misclassifications, `C_{FP}`

and `C_{FN}`

, an extension of the Symmetry point called the Generalized Symmetry point, `c_{GS}`

, can be defined as follows (López-Ratón et al., 2015):

`\rho (1-p(c_{GS})) = 1-q(c_{GS})`

where `\rho = \frac{C_{FP}}{C_{FN}}`

is the relative loss (cost) of a false positive classification as compared with a false negative classification.
Analogously to the Symmetry point, `c_{GS}`

is obtained graphically by the intersection point between the ROC curve and the line `y = 1 - \rho x`

.

In this package, the two methods proposed in López-Ratón et al. (2016) for estimating the Generalized Symmetry point and its sensitivity and specificity indexes are available:

`"GPQ"`

: Method based on the Generalized Pivotal Quantity (Weerahandi, 1993; 1995; Lai et al., 2012). It assumes that the diagnostic test on both groups or a monotone Box-Cox transformation is Normal distributed. So, the Generalized Symmetry point `c_{GS}`

can be estimated from the following equation:

`\Phi(a+b\Phi^{-1}(t)) = 1-\rho t \Leftrightarrow \Phi \left(\frac{\Phi^{-1}(1-\rho t)-a}{b}\right)-t=0`

where `a=\frac{\mu_1-\mu_0}{\sigma_1}`

, `b=\frac{\sigma_0}{\sigma_1}`

, `t=1-p(c_{GS})`

and `\Phi`

denotes the standard Normal cumulative distribution function (cdf), with `\mu_i`

and `\sigma_i`

, i = 0,1, the mean and standard deviation of healthy (`i`

=0) and diseased (`i`

=1) populations, respectively.
To check the assumption of normality, the Shapiro-Wilk test is used with a significance level of 5%.

`"EL"`

: Method based on the Empirical Likelihood (Thomas and Grunkemeier, 1975). It takes into account that `c_{GS}`

can be seen as two specific quantiles, the `p(c_{GS})`

-th quantile of the healthy population and the `\rho (1-q(c_{GS}))`

-th quantile of the diseased population. Following the same reasoning as in Molanes-López and Letón (2011), and considering that the value of `p(c_{GS})`

is known in advance and the Generalized Symmetry point defines an operating point on the ROC curve fulfilling `1-x=p(c_{GS})`

, the following adjusted empirical log-likelihood ratio function is derived to make inference on `c_{GS}`

:

`\ell(c)=2n_0\hat{F}_{0,g_{0}}(c)\log\!\left(\frac{\hat{F}_{0,g_{0}}(c)}{p(c)}\right) +2n_0(1-\hat{F}_{0,g_{0}}(c))\log\left(\frac{1-\hat{F}_{0,g_0}(c)}{1-p(c)}\right)`

```
+2n_1\hat{F}_{1,g_{1}}(c)\log\left(\frac{\hat{F}_{1,g_{1}}(c)}{\rho(1-p(c))}\right)
+2n_1(1-\hat{F}_{1,g_{1}}(c))\log\left(\frac{1-\hat{F}_{1,g_{1}}(c)}{1-\rho (1-p(c))}\right)\!,
```

where `\hat{F}_{i,g_{i}}(y)=\frac{1}{n_i}\sum_{k_i=1}^{n_i}K\left(\frac{y-Y_{ik_i}}{g_{i}}\right)`

are kernel-type estimates of the cdfs `F_{i}`

, of the two populations, `i=0,1`

, with `K(y)=\int_{-\infty}^{y} K(z)\mathrm{d}z`

a kernel function and `g_i`

the smoothing parameter, for `i=0,1`

.

`"auto"`

: the program selects automatically the most appropriate method of the two available, based on the normality assumption. The GPQ is selected under the normality assumption and the EL otherwise.

Returns an object of class "gsym.point" with the following components:

`methods` |
a character vector with the value of the |

`levels.cat` |
a character vector indicating the levels of the categorical covariate if the |

`call` |
the matched call. |

`data` |
the data frame with the variables used in the call. |

For each of the methods used in the call, a list with the following components is obtained:

`"optimal.result"` |
a list with the Generalized Symmetry point and its associated sensitivity and specificity accuracy measures with the corresponding confidence intervals. |

`"AUC"` |
the numerical value of the Area Under the ROC Curve. |

`"rho"` |
the numerical value of the cost ratio |

`"pvalue.healthy"` |
the numerical value of the p-value obtained by the Shapiro-Wilk normality test for checking the normality assumption of the marker in the healthy population. |

`"pvalue.diseased"` |
the numerical value of the p-value obtained by the Shapiro-Wilk normality test for checking the normality assumption of the marker in the diseased population. |

In addition, if the original data are not normally distributed the following components also appears:

`"lambda"` |
the estimated numerical value of the power used in the Box-Cox transformation. |

`"normality.transformed"` |
a character string indicating if the transformed marker values by the Box-Cox transformation are normally distributed ("yes") or not ("no"). |

`"pvalue.healthy.transformed"` |
the numerical value of the p-value obtained by the Shapiro-Wilk normality test for checking the normality assumption of the Box-Cox transformed marker in the healthy population. |

`"pvalue.diseased.transformed"` |
the numerical value of the p-value obtained by the Shapiro-Wilk normality test for checking the normality assumption of the Box-Cox transformed marker in the diseased population. |

Mónica López-Ratón, Carmen Cadarso-Suárez, Elisa M. Molanes-López and Emilio Letón

Gallop, R.J., Crits-Christoph, P., Muenz, L.R. and Tu, X.M. (2003). Determination and interpretation of the optimal operating point for ROC curves derived through generalized linear models. *Understanding Statistics* **2**, 219-242.

Jiménez-Valverde, A. (2012). Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling. *Global Ecology and Biogeography* **21**, 498-507.

Jiménez-Valverde, A. (2014). Threshold-dependence as a desirable attribute for discrimination assessment: implications for the evaluation of species distribution models. *Biodiversity Conservation* **23**, 369-385

Lai, C.Y., Tian, L. and Schisterman, E.F. (2012). Exact confidence interval estimation for the Youden index and its corresponding optimal cut-point. *Comput. Stat. Data Anal.* **56**, 1103-1114.

López-Ratón, M., Cadarso-Suárez, C., Molanes-López, E.M. and Letón, E. (2016). Confidence intervals for the Symmetry point: an optimal cutpoint in continuous diagnostic tests. *Pharmaceutical Statistics* **15(2)**, 178-192.

López-Ratón, M., Molanes-López, E.M., Letón, E. and Cadarso-Suárez, C. (2017). GsymPoint: An R Package to Estimate the Generalized Symmetry Point, an Optimal Cut-off Point for Binary Classification in Continuous Diagnostic Tests. *The R Journal* **9(1)**, 262-283.

Metz, C.E. (1978). Basic Principles of ROC Analysis. *Seminars in Nuclear Medicine* **8**, 183-298.

Molanes-López, E.M. and Letón, E. (2011). Inference of the Youden index and associated threshold using empirical likelihood for quantiles. *Statistics in Medicine* **30**, 2467-2480.

Molanes-López, E.M., Van Keilegom, I. and Veraverbeke, N. (2009). Empirical likelihood for non-smooth criterion functions. *Scandinavian Journal of Statistics* **36**, 413-432.

Remaley, A.T., Sampson, M.L., DeLeo, J.M., Remaley, N.A., Farsi, B.D. and Zweig, M.H. (1999). Prevalence-value-accuracy plots: a new method for comparing diagnostic tests based on misclassification costs. *Clinical Chemistry* **45**, 934-941.

Riddle, D.L. and Stratford, P.W. (1999). Interpreting validity indexes for diagnostic tests: An illustration using the Berg Balance Test. *Physical Therapy* **79**, 939-948.

Rutter, C.M. and Miglioretti, D.L. (2003). Estimating the accuracy of psychological scales using longitudinal data. *Biostatistics* **4**, 97-107.

Thomas, D.R. and Grunkemeier, G.L. (1975). Confidence interval estimation of survival probabilities for censored data. *Journal of the American Statistical Association* **70**, 865-871.

Wand, M.P. and Jones, M.C. (1995). *Kernel smoothing*. Chapman and Hall, London.

Weerahandi, S. (1993). Generalized confidence intervals. *Journal of the American Statistical Association* **88**, 899-905.

Weerahandi, S. (1995). *Exact statistical methods for data analysis*. Springer-Verlag, New York.

Zhou, W. and Jing, B.Y. (2003). Adjusted empirical likelihood method for quantiles. *Annals of the Institute of Statistical Mathematics* **55**, 689-703.

`control.gsym.point`

, `summary.gsym.point`

```
library(GsymPoint)
data(melanoma)
###########################################################
# marker: X
# status: group
###########################################################
###########################################################
# Generalized Pivotal Quantity Method ("GPQ"):
# Original data normally distributed
###########################################################
gsym.point.GPQ.melanoma<-gsym.point(methods = "GPQ", data = melanoma,
marker = "X", status = "group", tag.healthy = 0, categorical.cov = NULL,
CFN = 1, CFP = 1, control = control.gsym.point(),confidence.level = 0.95,
trace = FALSE, seed = FALSE, value.seed = 3, verbose = FALSE)
summary(gsym.point.GPQ.melanoma)
plot(gsym.point.GPQ.melanoma)
data(prostate)
###########################################################
# marker: marker
# status: status
###########################################################
###########################################################
# Generalized Pivotal Quantity Method ("GPQ"):
# Box-Cox transformed data normally distributed
###########################################################
gsym.point.GPQ.prostate <- gsym.point (methods = "GPQ", data = prostate,
marker = "marker", status = "status", tag.healthy = 0, categorical.cov = NULL,
CFN = 1, CFP = 1, control = control.gsym.point(), confidence.level = 0.95,
trace = FALSE, seed = FALSE, value.seed = 3, verbose = FALSE)
summary(gsym.point.GPQ.prostate)
plot(gsym.point.GPQ.prostate)
data(elastase)
###########################################################
# marker: elas
# status: status
###########################################################
###########################################################
# Generalized Pivotal Quantity Method ("GPQ"):
# Original data not normally distributed
# Box-Cox transformed data not normally distributed
###########################################################
gsym.point.GPQ.elastase <- gsym.point(methods = "GPQ", data = elastase,
marker = "elas", status = "status", tag.healthy = 0, categorical.cov = NULL,
CFN = 1, CFP = 1, control = control.gsym.point(), confidence.level = 0.95,
trace = FALSE, seed = FALSE, value.seed = 3, verbose = FALSE)
summary(gsym.point.GPQ.elastase)
plot(gsym.point.GPQ.elastase)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.