# RPV: Function for the determination of an inconclusive interval... In UncertainInterval: Uncertain Interval Methods for Three-Way Cut-Point Determination in Test Results

## Description

This function calculates Predictive Values, Standardized Predictive Values, Interval Likelihood Ratios and Posttest Probabilities of intervals or individual test scores of discrete ordinal tests. This function can correct for the unreliability of the test. It also trichotomizes the test results, with an uncertain interval where the test scores do not allow for an adequate distinction between the two groups of patients. This function is best applied to large samples with a sufficient number of patients for each test score.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14``` ```RPV( ref, test, pretest.prob = NULL, reliability = NULL, roll.length = NULL, extend = TRUE, decision.odds = 2, decision.use = c("standardized.pv", "posttest.probability", "LR", "predictive.value"), preselected.thresholds = c(NULL, NULL), digits = 2, use.perc = TRUE, show.table = FALSE ) ```

## Arguments

 `ref` A vector of two values, ordering 'patients with' > 'patients without', for instance 1, 0. When using a factor, please check whether the correct order of the values is used. `test` A vector of ordinal measurement level with a numeric score for every individual. When using a factor, please check whether the correct order of the values is used. Further, a warning message is issued concerning the calculation of the variance of the test when using a factor. `pretest.prob` (default = NULL) value to be used as pre-test probability. It is used for the calculation of the post-test probabilities. If pretest.prob = NULL, the sample prevalence is used. `reliability` (default NULL) The reliability of the test, used to calculate Standard Error of Measurement (SEM). The reliability is expressed as an applicable correlation coefficient, with values between 0 and 1. A Pearson's Product Moment correlation or an Intra-Class Coefficient (ICC) will do. N.B. A value of 1 assumes perfect reliability, is not realistic and prevents smoothing as it sets `roll.length` to 1. N.B. Setting parameter `roll.length` to a value >= 1 causes the reliability of the test to be ignored. `roll.length` (default = NULL) The frame length of the interval of test scores scores for the calculation of the reliable predictive values. When NULL, it is calculated as `round(SEM)*2+1` (approximately a 68\ confidence interval around the observed test score in which the true score is expected). The roll.length needs to be uneven. When the result is even, a lower value (-1) is assigned. When roll.length is supplied, this is used instead of the calculated value and reliability is ignored. Furthermore, roll.length has to be >= 1 and uneven. Applicable values are 1, 3, 5, etc. roll.length = 1 causes the reliability of the test to be ignored. `extend` (default = TRUE) The Reliable Predictive Values cannot be calculated for the most extreme scores. As the most extreme scores offer most often least uncertain decisions for or against the disease, the values that can be calculated are extended. When extend = FALSE, NA's (NOT Available's) are produced. `decision.odds` (default = 2). The minimum desired odds of probabilities for a positive or negative classification. This equals the desired Likelihood Ratio: the probability of a person who has the disease testing positive divided by the probability of a person who does not have the disease testing positive. For the uncertain range of test scores both the odds for and the odds against are therefore between 2 and 1/2. A decision.odds of 1 causes all test scores to be used for either positive or negative decisions. The limit for the Predictive Value = decision.odds / (decision.odds+1); for a decision, the Predictive Values needs to be larger than this limit. NB 1 Decision.odds can be a broken number, such as .55/(1-.55), which defines the decision limit for predictive values as .55. The default of 2 is therefore (2/3) / (1 - (2/3)) = 2, equal to a predictive value of .667. NB 2 When a test is more reliable and valid, a higher value for decision.odds can be applied. NB 3 For serious diseases with relatively uncomplicated cures, decision odds can be smaller than one. In that case, a large number of false positives is unavoidable and positive decisions are inherently uncertain. See Sonis(1999) for a discussion. `decision.use` (default = 'standardized.pv'). The probability to be used for decisions. When 'standardized.pv' is chosen, the standardized positive predictive value is used for positive decisions and the standardized negative predictive value is used for negative decisions. When 'posttest.probability' is chosen, pt.prob is used for positive decisions and (1 \- pt.prob) is used for negative decisions. When 'predictive.value' is chosen the rnpv are used for negative decisions and the rppv for positive decisions. N.B. These parameters can be abbreviated as 'stand', 'post' and 'pred'. N.B. The posttest probability is equal to the positive predictive value when pre-test probability = sample prevalence. N.B. The posttest probability is equal to the standardized positive predictive value when pre-test probability = .5. `preselected.thresholds` (default = c(NULL, NULL)). For use in comparisons, when preselected.thresholds has valid values these values are used for the determination of the cut-points. The two cut-points indicate the limits of the uncertain area. Parameter decision.use is ignored. When `preselected.thresholds > preselected.thresholds`, the higher scores are used for positive decisions. When ```preselected.thresholds < preselected.thresholds```, the lower scores are used for positive decisions. An uncertain Interval of a single test score can be determined with in-between values. For instance c(1.5, 0.8) defines an uncertain interval of test score 1 for a descending ordinal test. `digits` (default = 3) the number of digits used in the output. `use.perc` (default = TRUE) Use percentages for the output. When set to FALSE, proportions are used. `show.table` (default = FALSE) Show confusion table.

## Details

This function can be applied to ordinal data. Uncertain test scores are scores that have about the same density in the two distributions of patients with and without the targeted condition. This range is typically found around the optimal cut-point, that is, the point of intersection or Youden index (Schisterman et al., 2005). This function uses as a default the decision odds of ordinal test scores near 1 (default < 2). This results in a limit for the Predictive Values = decision.odds / (decision.odds+1).

N.B. 1: Sp = Negative Decisions | true.neg.status; Se = Positive Decisions | true.pos.status. Please note that the values for Se and Sp are underestimated, as the uncertain test scores are considered as errors, which they are not. (Se and Sp are intended for dichotomous thresholds.). Use `quality.threshold` and `quality.threshold.uncertain` for obtaining respectively quality indices for the test scores when ignoring test scores in the uncertain interval and the quality indices of the test scores within the uncertain interval.

N.B. 2: For the category Uncertain the odds are for the targeted condition (sum of patients with a positive.status)/(sum of patients with negative.status).

N.B. 3: Set roll.length to 1 to ignore the test reliability and obtain raw predictive values, likelihood ratios, etc., that are not corrected for the unreliability of the test.

N.B. 4: In contrast to the other functions, the `RPV` function can detect more than one uncertain interval. More than one uncertain interval almost always signals bad test quality and interpretation difficulties.

Raw predictive values compare the frequencies and provide exact sample values and are most suitable for evaluating the sample results. When prevalence is low, Positive Predictive Values can be disappointingly low, even for tests with high Se values. When prevalence is high, Negative Predictive Values can be low. Reliable Standardized Predictive Values compare the densities (relative frequencies) and are most suitable for comparing the two distributions of the scores for patients with and without the targeted condition.

The predictive values are calculated from the observed frequencies in the two samples of patients with and without the targeted disease. For a range of test scores x, if f0(x) and f1(x) are the frequencies of respectively patients without and with the targeted disease, then the negative predictive value (NPV) can be defined as: NPV(x) = f0(x) / (f0(x) + f1(x)) and the positive predictive value (PPV) as: PPV(x) = f1(x) / (f0(x) + f1(x)). The densities for a range of test scores x can be defined d0(x) = f0(x) / n0 and d1(x) = f1(x) / n1, where n0 and n1 are the number of observed patients in the two samples. The standardized negative predictive value (SNPV) is defined as SNPV(x) = d0(x) / (d0(x) + d1(x)) and the standardized positive predictive value (SPPV) as SPPV(x) = d1(x) / (d0(x) + d1(x)). The two distributions are weighed equally, or in other words, the prevalence is standardized to .5. N.B. The posttest probability is equal to the positive predictive value when the pretest probability is set to the sample prevalence, while the standardized positive predictive value is equal to the posttest probability when the pretest probability is set to .5.

Reliable estimates of the predictive probabilities correct to a certain degree for random variations. In test theory this random effect is estimated with the Standard Error of Measurement (SEM), which is directly dependent on the reliability of the test: `SEM = s * sqrt(1 - r)`, where s is the standard deviation of the test scores and r the estimated reliability of the test (Crocker & Algina, 1986; Harvill, 1991). The true score of a patient lies with some probability (roughly 68%) within a range of +- 1 SEM around the acquired test score. This provides information about the range of test scores that can be expected due to all kinds of random circumstances where no real changing agent has effect.

The results show the obtained values for the sample and are not corrected in any way. The classification 'Uncertain' shows the scores that lead to odds (d1(x) / d0(x)) that are lower than limit. This indicates that it is difficult to base classifications on that range of scores. The positive classifications are less error prone, with realized odds (d1(x) / d0(x)). These odds are close to 1 and smaller than the decision.odds. The negative classifications are less error prone than 'Uncertain' (odds = d0(x) / d1(x)).

The accuracy indices are shown as percentages. Sp = negative classifications given a true negative status. Se = positive classifications given a true positive status. NPV = proportion of Negative Classifications that are correct. PPV = proportion of Positive Classifications that are correct.

## Value

A list of:

\$parameters:

A named vector:

• pretest.prob: provided or calculated pre-test probability. Default, the calculated sample prevalence is used.

• sample.prevalence: the calculated sample prevalence.

• reliability: must be provided; ignored when roll.length = 0.

• SEM: the calculated Standard Error of Measurement (SEM).

• roll.length: the total length of the range around the test score (2 * SEM + 1).

• rel.conf.level: the confidence level of the range, given the reliability.

• limit: the limit applied to the values for calculating the decision result.

\$messages:

Two messages: 1. The test scores are reported for which reliable predictive values could not be calculated and have been extended from the nearest calculated value, 2. the kind of values (probabilities or LR) that are used for decisions.

\$rel.pred.values:

A table the test scores as columns and with rows: N.B. When roll.length is set to 1, the test reliability is ignored and the outcomes are not corrected for unreliability.

• rnpv: (more) reliable negative predictive value. Fitting for reporting sample results.

• rppv: (more) reliable positive predictive value.

• rsnpv: (more) reliable standardized negative predictive value.

• rsppv: (more) reliable standardized positive predictive value.

• rilr: (more) reliable interval likelihood ratio.

• rpt.odds: (more) reliable posttest odds.

• rpt.prob: (more) reliable posttest probabilities.

thresholds.UI:

Numeric values for the thresholds of the uncertain interval. This is NA when there are multiple ranges of test for the uncertain interval.

ranges:

The ranges of test scores for the Negative Classifications, Uncertain, Positive Classifications.

\$result:

Table of results for the current sample, calculated with the provided parameters.

• columns: Negative Classifications, Uncertain, Positive Classifications.

• row total.sample: percentage of the total sample.

• row correct.decisions: percentages of correct negative and positive decisions (NPV and PPV).

• row true.neg.status: percentage of patients with a true negative status for the 3 categories.

• row true.pos.status: percentage of patients with a true positive status for the 3 categories.

• row realized.odds: The odds that are realized in the sample for each of the three categories. NB The odds of the uncertain range of test scores concerns the odds for the targeted condition.

\$table

Show table of counts of decisions x true status.

## References

Sonis, J. (1999). How to use and interpret interval likelihood ratios. Family Medicine, 31, 432–437.

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart and Winston, 6277 Sea Harbor Drive, Orlando, FL 32887 (\$44.75).

Harvill, L. M. (1991). Standard error of measurement. Educational Measurement: Issues and Practice, 10(2), 33–41.

Landsheer, J. A. (In press). Impact of the Prevalence of Cognitive Impairment on the Accuracy of the Montreal Cognitive Assessment: The advantage of using two MoCA thresholds to identify error-prone test scores. Alzheimer Disease and Associated Disorders. https://doi.org/10.1097/WAD.0000000000000365

`synthdata_NACC` for an example
 ``` 1 2 3 4 5 6 7 8 9 10 11``` ```set.seed(1) # example of a validation sample ref=c(rep(0,1000), rep(1, 1000)) test=round(c(rnorm(1000, 5, 1), rnorm(1000, 8, 2))) # calculated roll.length is invalid. Set to 3. Post test probability equals # Positive Predictive Values. Parameter pretest.prob is set to sample prevalence. RPV(ref, test, reliability = .9, roll.length = 3) # Set roll.length = 1 to ignore test reliability (value of parameter # When pretest.prob is set to .5, the Post-test Probabilities are equal to # the Standardized Positive Predictive Values. RPV(ref, test, pretest.prob = .5, reliability = .9, roll.length = 3) ```