# Vignette Uncertain Interval" In UncertainInterval: Uncertain Interval Methods for Three-Way Cut-Point Determination in Test Results

title: "Vignette Uncertain Interval" author: "Hans Landsheer" date: "6-4-2020" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{"Vignette Uncertain Interval"} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\usepackage[utf8]{inputenc}



### How much uncertainty?

The question how much uncertainty is acceptable, depends on the clinical setting. The severity of the illness, the possibilities to treat, possible side effects of the treatments and the availability of better, perhaps more expensive bio-markers are relevant parameters. All these clinical parameters influence the decision how much uncertainty is acceptable and should lead to additional testing or active surveillance while waiting for possible further development of symptoms, respectively how much certainty is necessary to make a more definitive decision for or against the presence of the disease.

In the case of prostate cancer, the cancer often grows slowly and is often not causing any symptoms, while treatment with surgery or radiation has serious risks such as incontinence and impotence and does not necessarily prolongs life. Active surveillance with examinations each six months or awaiting possible changes in symptoms is often the more desirable line of action. In that case, we need a high certainty for decisive actions and might accept a wider range of test scores that we consider as diagnostically insufficient. As the PSA bio-marker is a relatively weak indicator, a wide range of test scores in the middle offers insufficient indication for treatment or no treatment. In general, a diagnostic insufficient indication can lead to the decision to use better or additional measurements that offer a better indication.

A wider uncertain interval is obtained by increasing the values of specificity and sensitivity of the uncertain test scores. Higher values make the test scores in the uncertain interval less uncertain, while the test scores outside the uncertain interval provide increased certainty, that is an increased percentage of correct decisions considering the true status of the patient. In the next code block, specificity and sensitivity is increased to respectively .60, .65 and .70.

(res=ui.nonpar(psa2b$d, psa2b$tpsa, UI.Se = .60, UI.Sp = .60))
quality.threshold(psa2b$d, psa2b$tpsa, res[1], res[2])$indices[c('MCI.Sp', 'MCI.Se')] (res=ui.nonpar(psa2b$d, psa2b$tpsa, UI.Se =.65, UI.Sp = .65)) quality.threshold(psa2b$d, psa2b$tpsa, res[1], res[2])$indices[c('MCI.Sp', 'MCI.Se')]

(res=ui.nonpar(psa2b$d, psa2b$tpsa, UI.Se =.70, UI.Sp = .70))
quality.threshold(psa2b$d, psa2b$tpsa, res[1], res[2])$indices[c('MCI.Sp', 'MCI.Se')]  The function quality.threshold produces several statistics to describe the qualities of the test scores outside the uncertain interval (MCI or More Certain Intervals). In this case, when increasing the uncertain interval, outside the uncertain interval the specificity improves faster than the sensitivity. However, in the last case, the sensitivity deteriorates for the more certain interval (MCI) outside the defined uncertain interval. The reason for this deterioration is that the number of correctly identified true patients does hardly increase, while the total number of true prostate cancer patients with scores outside the uncertain interval becomes smaller. Increasing the uncertain interval should therefore not be applied blindly. ### Usage tip: using negated values The UncertainInterval package uses an uniform definition of thresholds. When a single threshold is applied, test scores >= threshold indicate patients with the targeted disease and test scores < threshold indicate patients without the targeted disease. When an uncertain interval is determined, test scores >= lower threshold AND <= upper threshold are considered as uncertain. Test scores < lower threshold indicate the absence of the targeted disease and test scores > upper threshold indicate the presence of the targeted disease. NOTE 5: In the UncertainInterval package it is assumed that the test values of the patients with the targeted condition are larger than the test values of the patients without the targeted condition. When this is not the case, when lower values indicate the presence of the targeted condition, one can simply negate the test values by putting a minus sign before the test values. This makes the interpretation straightforward; all quality indices remain applicable. If one wants to report the non-negated test values (for instance when plotting the distributions), it is sufficient to discard the negative sign. Of course, relative comparisons, such as test score -3 > -5, reverses: 3 < 5. When a single threshold of -4 is determined, test scores >= -4 indicate patients with the targeted disease. When one wants to report the non-negated value, test scores <= 4 indicate patients with the targeted disease. ### Using transformations When using a diagnostic test, the ultimate goal is to apply the test to new patients, other than the patients used for verification of the method. Therefore, generalization is as important for any test as is its sensitivity, specificity or other validation indicators. Most often, a parametric model is optimized for the samples used but lacks a population model. Consequently, a non-parametric approach of a test necessitates considerable larger samples for validation to ensure that the quality indicators of the test results are not overly adapted to a small sample with particular characteristics. Generalization is easier when a valid parametric model can be applied, such as the bi-normal model. The advantage of a valid parametric model for the test is that we have a population model that allows for greater precision, accuracy and power. Results of a representative sample can be used to estimate population values and can be more easily generalized to other representative samples. The best results for generalization are achieved when a test is developed from the ground up with the use of a specific parametric model and a well defined population in mind. Unfortunately, this is not so often the case for medical tests. In figure 2 it is easy to see that the PSA test is not very good in identifying true patients but also not good in identifying patients without prostate cancer. Another approach of this data-set is the use of a transformation, so that a parametric method can be used. From a statistical point of view, we can ask whether a transformation of these test scores could be used to improve on the distribution. In this case, we can use the Box-Cox transformation (available in the car package) to improve the distributions (figures 4 & 5). library(UncertainInterval) if (!require(car)) install.packages("car",dependencies=TRUE) library(car) data(psa2b) p1 = powerTransform(psa2b$tpsa)
t_tpsa = bcPower(psa2b$tpsa, p1$roundlam)

qqPlot(t_tpsa[psa2b$d==0]) qqPlot(t_tpsa[psa2b$d==1])


The Box-Cox transformation limits skewness of the data by applying a power transformation. This transformation maintains the rank order of the data, but changes its distribution. The method searches for the power transformation that results in the lowest variance. The intention of the method is to transform the data to a shape that is more similar to a normal distribution. There is no guarantee, but the two qq plots show that a reasonable approximation of normal distributions is reached.

Figure 6 shows how the data are approximated using a bi-normal model.

plotMD(psa2b$d, t_tpsa, model='binormal', position.legend = 'topleft') (res1=ui.binormal(psa2b$d, t_tpsa))
abline(v=res1$solution, col= 'red') invBoxCox <- function(x, lambda) if (lambda == 0) exp(x) else (lambda*x + 1)^(1/lambda) invBoxCox(res1$solution, p1$roundlam)  Of course, not all problems have disappeared; outliers are still there. The boxplot in figure 7 shows that the outliers in the lower tail are problematic: low scores < -3 indicate both a patient and a non-patient. These extreme scores in the sample also influence the estimates of the bi-normal distributions somewhat. r outlier_values <- boxplot.stats(t_tpsa)$out  # outlier values.
inds <- which(t_tpsa %in% outlier_values)
table(outlier_values, psa2b$d[inds]) boxplot(t_tpsa, main="", boxwex=0.1) # mtext(paste("Outliers: ", paste(round(outlier_values, 2), collapse=", ")), # cex=0.6)  Leaving out these extreme values may improve the estimates. NOTE 6: the exercises here are intended as illustrations how the UncertainInterval package can be used in the case of a test with a difficult distribution. The PSA test is not developed with this transformation or the bi-normal model in mind. Whether this transformation is the best way to deal specifically with PSA test scores would require additional research to ascertain that the transformation has more general validity. Furthermore, there is considerable discussion about the usefulness of PSA scores for early identification of prostate cancer. Some researchers have argued that the PSA test may have better predictive value for specific age sub-populations (Sadi, 2017). sel=t_tpsa > -3 plotMD(psa2b$d[sel], t_tpsa[sel], model='binormal')
(res55=ui.binormal(psa2b$d[sel], t_tpsa[sel])) abline(v=res55$solution, col= 'red')

invBoxCox <- function(x, lambda)
if (lambda == 0) exp(x) else (lambda*x + 1)^(1/lambda)
invBoxCox(res55$solution, p1$roundlam)


Of course, a wider range of test scores can be considered as diagnostically uncertain. In the next block we compare higher values for sensitivity and specificity of the uncertain test scores and show the obtained results for the outer, more certain regions:

res60=ui.binormal(psa2b$d[sel], t_tpsa[sel], UI.Se=.60, UI.Sp=.60) res60$results
quality.threshold.uncertain(psa2b$d, t_tpsa, res60$solution[1], res60$solution[2])$indices[c('UI.Se',  'UI.Sp')]
quality.threshold(psa2b$d, t_tpsa, res60$solution[1], res60$solution[2])$indices[c('MCI.Se',  'MCI.Sp')]

res65=ui.binormal(psa2b$d[sel], t_tpsa[sel], UI.Se=.65, UI.Sp=.65) res65$results
quality.threshold.uncertain(psa2b$d, t_tpsa, res65$solution[1], res65$solution[2])$indices[c('UI.Se',  'UI.Sp')]
quality.threshold(psa2b$d, t_tpsa, res65$solution[1], res65$solution[2])$indices[c('MCI.Se',  'MCI.Sp')]

res70=ui.binormal(psa2b$d[sel], t_tpsa[sel], UI.Se=.70, UI.Sp=.70) res70$results
quality.threshold.uncertain(psa2b$d, t_tpsa, res70$solution[1], res70$solution[2])$indices[c('UI.Se',  'UI.Sp')]
quality.threshold(psa2b$d, t_tpsa, res70$solution[1], res70$solution[2])$indices[c('MCI.Se',  'MCI.Sp')]


The statistics estimated for the uncertain interval of the modeled distribution are exactly what we asked for: sensitivity and specificity are respectively .60, .65 and .70. These estimates are no longer solely based on the sample and the values in the sample deviate, as can be expected. In the last example, with sensitivity and specificity set to .7, the sample values are respectively .73 and .67. When the bi-normal model is valid for these test scores, the bi-normal estimates of .7 can be considered as better, but this requires additional prove.

The obtained sample values for the outer regions show systematic improvements when sensitivity and specificity for the uncertain test scores are higher. These sample values are provided by the quality.threshold function. In the last case, specificity has increased to .97 while sensitivity has increased to .84.

A further question is how many patients are considered as having an uncertain test result, when using these values for sensitivity and specificity for determination of the uncertain interval. The extended confusion tables are produced by the quality.threshold function. The diagnoses based on the uncertain test scores are labeled as NA:

quality.threshold(psa2b$d, t_tpsa, res55$solution[1], res55$solution[2])$table
quality.threshold(psa2b$d, t_tpsa, res60$solution[1], res60$solution[2])$table
quality.threshold(psa2b$d, t_tpsa, res65$solution[1], res65$solution[2])$table
quality.threshold(psa2b$d, t_tpsa, res70$solution[1], res70$solution[2])$table

res70$solution invBoxCox(res70$solution, p1roundlam)  When applying the last results, 483 patients of the sample have a test score that is considered as uncertain (with PSA scores between .99 and 12.5) and should be monitored, 153 patients would receive a decision that the likelihood of having prostate cancer is very low (with 8 errors), and 47 patients would receive a decision for immediate further diagnostics and possible treatment of their prostate cancer (with 4 errors). Clearly, patients with test scores within the uncertain interval should receive more frequent active surveillance when their PSA score is higher. The PSA test is a weak bio-marker. A better predictive result is possible when using a better test and/or by integrating additional relevant information. A better predictive result would result in a smaller group for monitoring and a lower number of errors. ## Ordinal scales and predictive values Recently, the function RPV has been added to the UncertainInterval package. This function calculates Predictive Values, Standardized Predictive Values, Interval Likelihood Ratios and Post-test Probabilities of individual or intervals of test scores of discrete ordinal tests. Another vignette (UI_RPV) illustrates its use. Furthermore, this function can correct for the unreliability of the test. It also trichotomizes the test results, with an uncertain interval where the test scores do not allow for an adequate distinction between the two groups of patients. This function is best applied to large samples with a sufficient number of patients for each test score. As predictive values are calculated for individual test scores, relatively large samples are required for its usage. When there is a sufficient number of patients with and without a specific test score, it is possible to estimate whether this test score is uncertain. As in the other functions for the estimation of an uncertain interval, this is defined as densities of the test score as about equal in both the distributions of patients with and without the targeted disease. This function uses as a default the decision odds of ordinal test scores near 1 (default < 2). The limit for the Predictive Values = decision.odds / (decision.odds+1). Default: Predictive values < .6667. NOTE 7: The default of .667 is less stringent than Se = Sp = .55 and allows for a wider uncertain interval. A limit of the predictive value of .55, would result in odds of .55/(1-.55) = 1.2222). The use of standardized predictive values are recommended for the estimation of the uncertain interval. Standardized predictive values can be calculated directly from the densities (relative frequencies) of the two distributions of patients with and without the targeted condition. The standardized negative predictive value (SNPV) is defined as SNPV(x) = d0(x) / (d0(x) + d1(x)) and the standardized positive predictive value (SPPV) as SPPV(x) = d1(x) / (d0(x) + d1(x)), where d0(x) is the density of test score x for patients without the targeted disease and d1(x) is the density of test score x for patients with the targeted disease. The two distributions are therefore weighed equally. In other words, the prevalence is standardized to .5. In this way, a predictive value can be considered as the probability that the patient's test score is selected from the distribution of test scores of patients with (positive predictive value) or from the distribution of test scores of patients without the targeted disease. When calculated for the same (interval of) test scores, SNPV = 1 - SPPV and NPV = 1 - PPV. Clearly, when prevalence is low, the number of patients with the disease for a specific test score can be very low. Estimates of the (standardized) predictive values of an individual test score for this low number of patients is in that case not very reliable or valid. This is the reason why large samples are needed. N.B. The post-test probability is equal to the positive predictive value when the pretest probability is set to the sample prevalence, while the standardized positive predictive value is equal to the post-test probability when the pretest probability is set to .5. ## Short ordinal scales Many medical tests have only a limited number of discrete values that can be roughly ordered from 'good' to 'bad'. In other words, these scales are on an ordinal level and are short in the sense that they have a limited and low number of discrete values. As there are only a few discrete values, an uncertain interval would only cover a single or a few different test scores. For a single test score, it is not possible to define a threshold (intersection) which would allow for the definition of sensitivity and specificity of the uncertain test scores. When just a few scores are considered as uncertain, sensitivity or specificity of these uncertain test scores can wildly deviate from the desired values. In other words, the methods presented above cannot be applied. One approach to the problem is simple visual inspection, the other is applying the ui.ordinal function which uses other criteria than sensitivity and specificity for the determination of an uncertain interval. ### Visual inspection For visual inspection, the PlotMD function can display an exact representation of the sampled data with a limited number of discrete values when model = 'ordinal' is used. The data of Tosteson & Begg (1988) illustrate this. There are 63 patients without cancer and 33 patients who have cancer (in this case either breast cancer or colon cancer). These patients have received a score of 1 to 5, indicating the presence of metastatic disease to the liver. A liver metastasis is a malignant tumor in the liver that has spread from another organ that has been affected by cancer. data("tostbegg2") sel = tostbegg2type==0
plotMD(ref=tostbegg2$d, test=tostbegg2$y, model='ordinal')


Visual inspection teaches us that the score 3 is the most uncertain indicator. The radiologists used this to indicate the possibility of metastatic disease to the liver. This score is the most uncertain and is best used as an indication that further data collection is necessary. But it is also clear that all the scores provide some uncertainty concerning a decision for or against the presence of metastatic cancer.

### Using the ui.ordinal function

It should be clear that when using this few scores, all quality indices concern the limited score values and cannot be very precise. However, it is possible to calculate an interval of uncertain test scores, using a loss function. The documentation of this function provides the details. The function ui.ordinal provides additional information about possible choices for the uncertain interval:

ui.ordinal(ref=tostbegg2$d, test=tostbegg2$y, return.all=TRUE)


The warning indicates that no solution is found for the given (default) ratio constraints (lower ratio .8, upper ratio 1.25). These ratio's limit the ratio of patients with the disease and without the disease within the uncertain interval between .8 and 1.25. The selected candidate is the same as was chosen on the basis of visual inspection. The ui.ordinal function provides directly several quality indices for both possible uncertain intervals and for the test scores outside the uncertain interval (MCI's).

## References

Etzioni R, Pepe M, Longton G, Hu C, Goodman G (1999). Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancer. Medical Decision Making 19:242-51.

Landsheer, JA (2016). Interval of Uncertainty: An Alternative Approach for the Determination of Decision Thresholds, with an Illustrative Application for the Prediction of Prostate Cancer. PloS One, 11(11), e0166007.

Landsheer, J. A. (2018). The Clinical Relevance of Methods for Handling Inconclusive Medical Test Results: Quantification of Uncertainty in Medical Decision-Making and Screening. Diagnostics, 8(2), 32. https://doi.org/10.3390/diagnostics8020032

Sadi, M. V. (2017). PSA screening for prostate cancer. Revista Da Associação Médica Brasileira, 63(8), 722–725.

Schisterman, E. F., Perkins, N. J., Liu, A., & Bondell, H. (2005). Optimal cut-point and its corresponding Youden Index to discriminate individuals using pooled blood samples. Epidemiology, 73–81.

Tosteson AN, Begg CB (1988). A general regression methodology for ROC curve estimation. Medical Decision Making 8:204-15.

## Try the UncertainInterval package in your browser

Any scripts or data that you put into this service are public.

UncertainInterval documentation built on March 3, 2021, 1:10 a.m.