ui.ordinal: Function to explore possible uncertain intervals of ordinal...

Description Usage Arguments Details Value References See Also Examples

View source: R/ui.ordinal.R

Description

This function is intended to be used for ordinal tests with a small number of distinct test values (for instance 20 or less). This function explores possible uncertain intervals (UI) of the test results of the two groups. This functions allows for considerable fine-tuning of the characteristics of the interval of uncertain test scores, in comparison to other functions for the determination of the uncertain interval and is intended for tests with a limited number of ordered values and/or small samples.

When a limited number of distinguishable scores is available, estimates will be coarse. When more than 20 values can be distinguished, ui.nonpar or ui.binormal may be preferred. When a sufficiently large data set is available, the function RPV may be preferred for the analysis of discrete ordered data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
ui.ordinal(
  ref,
  test,
  select.max = c("MCI.Sp+MCI.Se", "MCI.C", "MCI.Acc", "MCI.Se", "MCI.Sp", "MCI.n",
    "All"),
  constraints = c(C = 0.57, Acc = 0.6, lower.ratio = 0.8, upper.ratio = 1.25),
  weights = c(1, 1, 1),
  intersection = NULL,
  return.all = FALSE,
  ...
)

Arguments

ref

The reference standard. A column in a data frame or a vector indicating the classification by the reference test. The reference standard must be coded either as 0 (absence of the condition) or 1 (presence of the condition). When mean(test[ref == 0]) > mean(test[ref == 1]) it is assumed that higher test scores indicate presence of the condition, otherwise that lower test scores indicate presence of the condition.

test

The test or predictor under evaluation. A column in a data set or vector indicating the test results on an ordinal scale.

select.max

Selects the candidate thresholds on basis of a desired property of the More Certain Intervals (MCI). The criteria are: maximum Se+Sp (default), maximum C (AUC), maximum Accuracy, maximum Sp, maximum Se, maximum size of MCI. The last alternative 'All' is to choose all possible details.

constraints

Sets upper constraints for various properties of the uncertain interval: C-statistic (AUC), Acc (accuracy), lower and upper limit of the ratio of the proportions with and without the targeted condition. The default values are C = .57, Acc = .6, lower.ratio = .8, upper.ratio = 1.25. These values implement the desired uncertainty of the uncertain interval. The value of C (AUC) is considered the most important and has the most restrictive default value. For Acc and C, the values closest to the desired value are found and then all smaller values are considered. The other two constraints are straightforward lower and upper limits of the ratio between the number of patients with and without the targeted disease. If you want to change the values of these constraints, it is necessary to name all values. C = 1 or Acc = 1 excludes C respectively accuracy as selection criterion. If no solution is found, the best is showed together with a warning message.

weights

(Default = c(1, 1, 1). Vector with weights for the loss function. weights[1] is the weight of false negatives, weights[2] is the weight for loss in the uncertain interval (deviations from equal chances to belong to either distribution), and weights[3] is the weight for false positives. When a weight is set to a larger value, thresholds are selected that make the corresponding error smaller while the area grows smaller.

intersection

(Default = NULL). Optional value to de used as value for the intersection. If no value is supplied, the intersection is calculated using the function get.intersection(ref = ref, test = test, model='ordinal'), that provides a gaussian kernel estimate of the intersection.

return.all

(Default = FALSE). When TRUE $data.table and $uncertain.interval are included in the output.

...

Further parameters that can be transferred to the density function.

Details

Due to the limited possibilities of short scales, it is more difficult to determine a suitable uncertain interval when compared to longer scales. This problem is aggravated when samples are small. For any threshold determination, one needs a large representative sample (200 or larger). If there are no test scores below the intersection in the candidate uncertain area, Sp of the Uncertain Interval (UI.Sp) is not available, while UI.Se equals 1. The essential question is always whether the patients with the test scores inside the uncertain interval can be sufficiently distinguished. The candidate intervals are selected on various properties of the uncertain interval. The defaults are C (AUC) lower than .6, Acc (accuracy) lower than .6, and the ratio of proportions of persons with / without the targeted condition between .8 and 1.25. These criteria ensure that all candidates for the uncertain interval have insufficient accuracy. The second criterion is the desired property of the More Certain Intervals (see select.max parameter). The model used is 'ordinal'. This model default for the adjust parameter send to the density function is 2, but you can enter another value such as adjust = 1.

Dichotomous thresholds are inclusive the threshold for positive scores (patients). The count of positive scores are therefore >= threshold when the mean score for ref == 0 is lower than for ref == 1 and <= threshold when the mean score for ref == 0 is higher.

Both the Youden threshold and the (default used) gaussian kernel estimate of the intersection are estimates of the true intersection. In some circumstances the Youden threshold can be preferred, especially when the data show spikes for lowest and/or highest values. In many situations the gaussian kernel estimate is to be preferred, especially when there is more than one intersection.In many situations the two estimates are close to each other, but especially for coarse data they might differ.

Discussion of the first example (please run the code first): Visual inspection of the mixed densities function plotMD shows that distinguishing patients with and without the targeted condition is almost impossible for test scores 2, 3 and 4. Sensitivity and Specificity of the uncertain interval should be not too far from .5. In the first example, the first interval (3:3) has no lower scores than the intersection (3), and therefore UI.Sp is not available and UI.Se = 1. The UI.ratio indicates whether the number of patients with and without the condition is equal in this interval. For these 110 patients, a diagnosis of uncertainty is probably the best choice. The second interval (3:4) has an UI.Sp of .22, which is a large deviation from .5. In this slightly larger interval, the patients with a test score of 3 have a slightly larger probability to belong to the group without the condition. UI.Se is .8. UI.ratio is close to 1, which makes it a feasible candidate. The third interval (2:4) has an UI.Sp of .35 and an UI.Se of .70 and an UI.ratio still close to one. The other intervals show either Se or Sp that deviate strongly from .5, which makes them unsuitable choices. Probably the easiest way to determine the uncertain interval is the interval with minimum loss. This is interval (2:4). Dichotomization loss L2 can be defined as the sum of false negatives and false positives. The Youden threshold minimizes these. The Loss formula L3 for trichotomization of ordinal test scores is (created by https://www.codecogs.com/latex/eqneditor.php):

L3 = 1/N * (sum(abs(d0[u:l] - d1[u:l])) + sum(d1[1:(l-1)]) + sum(d0[(u+1):h]))

where d0 represents the test scores of the norm group, d1 represents the test scores of the targeted patient group, l is the lower limit of the uncertain interval, u the upper limit, the first test score is enumerated 1 and the last test score is enumerated h. N is the total number of all persons with test scores.

Loss L is higher when the deviation from equality is higher in the uncertain area, higher when the number of False Negatives is higher, and higher when the number of False Positives is higher. The loss of a single threshold method equals 1 - its Accuracy. In this example, the minimum Loss is found with interval (2:4). As this agrees with values for UI.C and UI.ratio that sufficiently indicates the uncertainty of these test scores, this seems the most suitable choice: the number of patients with test scores 2 to 4 are almost as likely to come from either population. The remaining cases outside the uncertain interval (2:4) show high C, Accuracy, Specificity and Sensitivity.

Value

List of values:

$Youden

A vector of statistics concerning the maximized Youden index:

References

Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3(1), 32-35. https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3

Schisterman, E. F., Perkins, N. J., Liu, A., & Bondell, H. (2005). Optimal cut-point and its corresponding Youden Index to discriminate individuals using pooled blood samples. Epidemiology, 73-81.

Landsheer, J. A. (2016). Interval of Uncertainty: An alternative approach for the determination of decision thresholds, with an illustrative application for the prediction of prostate cancer. PLOS One.

Landsheer, J. A. (2018). The Clinical Relevance of Methods for Handling Inconclusive Medical Test Results: Quantification of Uncertainty in Medical Decision-Making and Screening. Diagnostics, 8(2), 32. https://doi.org/10.3390/diagnostics8020032

See Also

plotMD or barplotMD for plotting the mixed densities of the test values. density for the parameters of the density function. ui.nonpar or ui.binormal can be used when more than 20 values can be distinguished on the ordinal test scale. When a large data set for an ordinal test is available, one might consider RPV.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# A short test with 5 ordinal values
test0     = rep(1:5, times=c(165,14,16,55, 10)) # test results norm group
test1     = rep(1:5, times=c( 15,11,13,55,164)) # test results of patients
ref = c(rep(0, length(test0)), rep(1, length(test1)))
test = c(test0, test1)
table(ref, test)
plotMD(ref, test, model="ordinal") # visual inspection
# In this case we may prefer the Youden estimate
ui.ordinal(ref, test, intersection="Youden", select.max="All")
# Same solution, but other layout of the results:
ui.ordinal(ref, test, select.max=c("MCI.Sp+MCI.Se", "MCI.C", "MCI.Acc",
                                   "MCI.Se", "MCI.Sp", "MCI.n"))
# Using a gaussian kernel estimate of the true intersection
# gives the same best result for the uncertain interval.
# The estimates for ui.Se, ui.Sp and ui.Acc differ for another intersection:
ui.ordinal(ref, test, select.max="All")

nobs=1000
set.seed(6)
Z0 <- rnorm(nobs, mean=0)
b0=seq(-5, 8, length.out=31)
f0=cut(Z0, breaks = b0, labels = c(1:30))
x0=as.numeric(levels(f0))[f0]
Z1 <- rnorm(nobs, mean=1, sd=1.5)
f1=cut(Z1, breaks = b0, labels = c(1:30))
x1=as.numeric(levels(f1))[f1]
ref=c(rep(0,nobs), rep(1,nobs))
test=c(x0,x1)
plotMD(ref, test, model='ordinal') # looks like binormal
# looks less binormal, but in fact it is a useful approximation:
plotMD(ref, test, model='binormal')
ui.ordinal(ref, test)
ui.binormal(ref, test) # compare application of the bi-normal model

UncertainInterval documentation built on March 3, 2021, 1:10 a.m.