#'Function for the determination of an inconclusive interval for continuous test
#'scores
#'
#'@name ui.nonpar
#'@description This function uses a non-parametric approach to determine an
#' interval around the intersection of the two distributions of individuals
#' without (0) and with (1) the targeted condition. The Uncertain Interval is
#' generally defined as an interval below and above the intersection, where the
#' densities of the two distributions of patients with and without the targeted
#' condition are about equal. These test scores are considered as inconclusive
#' for the decision for or against the targeted condition. The interval is
#' restricted both by a maximum UI.Se of the test scores within the
#' uncertain interval (UI.Se) and by a maximum specificity of the test scores
#' within the uncertain interval (UI.Sp).
#'@param ref The reference standard. A column in a data frame or a vector
#' indicating the classification by the reference test. The reference standard
#' must be coded either as 0 (absence of the condition) or 1 (presence of the
#' condition)
#'@param test The index test or test under evaluation. A column in a dataset or
#' vector indicating the test results in a continuous scale.
#'@param UI.Se (default = .55). The sensitivity of the test scores within the
#' uncertain interval is either limited to this value or is the nearest to this
#' value. A value <= .5 is useless.
#'@param UI.Sp (default = .55). The specificity of the test scores within the
#' uncertain interval is either limited to this value or is the nearest to this
#' value. A value <= .5 is useless.
#'@param intersection (default = NULL) When NULL, the intersection is calculated
#' with \code{get.intersection}, which uses the kernel density method to obtain
#' the intersection. When another value is assigned to this parameter, this
#' value is used instead.
#'@param return.first (default = TRUE) Return only the widest possible interval,
#' given the restrictions. When FALSE all calculated intervals with their
#' sensitivity and specificity are returned. NOTE: This function does not
#' always find a suitable interval and can return a vector of NULL values.
#'@param select (default = 'nearest') If 'nearest', sensitivity and specificity
#' of the uncertain interval are nearest UI.Se and UI.Sp respectively. When
#' 'limited' the solutions have an uncertain interval with a sensitivity and
#' specificity limited by UI.Se and UI.Sp respectively.
#'@param direction Default = 'auto'. Direction when comparing controls with
#' cases. Often, the controls have lower values than the cases (direction =
#' '<'). When 'auto', mean comparison is used to determine the direction.
#'
#'@details{ This function can be used for a test without a defined distribution
#'of the continuous test scores. The Uncertain Interval is generally defined as
#'an interval below and above the intersection, where the densities of the two
#'distributions of patients with and without the targeted condition are about
#'equal. This function uses for the definition of the uncertain interval a
#'sensitivity and specificity of the uncertain test scores below a desired value
#'(default .55).
#'
#'This essentially non-parametric function finds the best possible solution for
#'a sample. This function can be used for test with continuous scores or for
#'test with about twenty or more ordered test scores. The Uncertain Interval is
#'defined as an interval below and above the intersection, with a sensitivity
#'and specificity nearby or below a desired value (default .55).
#'
#'In its core, the \code{ui.nonpar} function is non-parametric, but it uses the
#'gaussian kernel for estimating the intersection between the two distributions.
#'Always check whether your results are within reason. If the results are
#'unsatisfactory, first check on the intersection. The \code{density} function
#'allows for other approximations than gaussian. Another estimate can be
#'obtained by using a more suitable kernel in the \code{density} function. The
#'parameter \code{intersection} can be used to assign the new estimate to the
#'\code{uncertain.interval} method.
#'
#'Furthermore, only a single intersection is assumed (or a second intersection
#'where the overlap is negligible). It should be noted that in most cases, a
#'test with more than one intersection with non-negligible overlap is
#'problematic and difficult to apply.
#'
#'The Uncertain interval method is developed for continuous distributions,
#'although it can be applied to ordered tests with distinguishable
#'distributions. When a test is used with less than 20 discernible values, a
#'warning is issued. The method may work satisfactorily, but results should
#'always be checked carefully.
#'
#'In general, when estimating decision thresholds, a sample of sufficient size
#'should be used. It is recommended to use at least a sample of 100 patients
#'with the targeted condition, and a 'healthy' sample (without the targeted
#'condition) of the same size or larger.
#'
#'The Uncertain interval method is not always capable to deliver results,
#'especially when select == 'limited'. Clearly, when there is no overlap between
#'the two distributions, there cannot be an uncertain interval. A very small
#'interval of overlap can also limit the possibilities to find a solution. When
#'there is no solution found, a vector of NA values is returned. }
#'@return {A \code{data.frame} of \describe{ \item{cp.l}{ Lower bound of the
#' Uncertain interval.} \item{cp.h}{ Upper bound of the Uncertain interval.}
#' \item{FN}{ Count of false negatives within the Uncertain interval.}
#' \item{TP}{ Count of true positives within the Uncertain interval.}
#' \item{TN}{ Count of true negatives within the Uncertain interval.}
#' \item{FP}{ Count of false positives within the Uncertain interval.}
#' \item{UI.Se}{ Sensitivity of the test scores within the Uncertain
#' interval.}
#' \item{UI.Sp}{ Specificity of the test scores within the
#' Uncertain interval.} } Only a single row is returned when parameter
#' \code{return.first} = TRUE (default).}
#'@importFrom reshape2 melt
#'@references Landsheer, J. A. (2016). Interval of Uncertainty: An Alternative
#' Approach for the Determination of Decision Thresholds, with an Illustrative
#' Application for the Prediction of Prostate Cancer. PloS One, 11(11),
#' e0166007.
#'
#' Landsheer, J. A. (2018). The Clinical Relevance of Methods for Handling
#' Inconclusive Medical Test Results: Quantification of Uncertainty in Medical
#' Decision-Making and Screening. Diagnostics, 8(2), 32.
#' https://doi.org/10.3390/diagnostics8020032
#'
#' @examples
#' # A simple test model
#' set.seed(1)
#' ref=c(rep(0,500), rep(1,500))
#' test=c(rnorm(500,0,1), rnorm(500,1,1))
#' ui.nonpar(ref, test, select='limited')
#'
#' ref = c(rep(0,20), rep(1,20))
#' test= c(rnorm(20), rnorm(20, mean=1))
#' ui.nonpar(ref, test)
#'
#'@export
ui.nonpar <-
function(ref,
test,
UI.Se = .55,
UI.Sp = .55,
intersection = NULL,
return.first = T,
select = c('nearest', 'limited'),
direction = c('auto','<', '>')) {
find.closest <- function(M, crit){
mindiff=min(abs(M-crit))
which((M == crit+mindiff) | (M == crit-mindiff), arr.ind=T)
}
bootstrap=0 # experimental
select = match.arg(select)
direction = match.arg(direction)
df = check.data(ref, test, model='kernel')
if (direction == 'auto'){
if (mean(df$test[ref==0]) > mean(df$test[ref==1])) {
direction = '>'
} else {
direction = '<'
}
}
# only one relevant intersection assumed!
# linear tests are used for determination of point of intersection
# linear test is assumed to have a normal distribution
if (is.null(intersection)) {
intersection = get.intersection(df$ref, df$test, model='kernel')
if (length(intersection) > 1) {
intersection = utils::tail(intersection, n = 1)
warning('More than one point of intersection. Highest density used. \n')
} else intersection=intersection[1] # other values are ignored
}
negate = (direction == '>')
if (negate) {
df$test = -df$test
intersection = -intersection
}
if (bootstrap > 0) {
# o4 = uncertain.interval(df$ref, df$test, UI.Se, UI.Sp, intersection, return.first=T,
# select, bootstrap=0)
o4 = ui.nonpar(df$ref, df$test, UI.Se, UI.Sp, intersection, return.first=T,
select)
n = nrow(df) # n=10
for (i in 1:bootstrap){
sa = sample(n, replace=T)
# o5 = uncertain.interval(df$ref[sa], df$test[sa], UI.Se, UI.Sp, intersection, return.first=T,
# select, bootstrap=0)
o5 = ui.nonpar(df$ref[sa], df$test[sa], UI.Se, UI.Sp, intersection, return.first=T,
select)
o4 = rbind(o4,o5)
}
if (return.first){
return(list(sample.est=o4[1,], boostrap.est=colMeans(o4[-1,])))
} else {
return(list(sample.est=o4[1,], boostrap.est=o4[-1,]))
}
}
if (UI.Se < .5) stop('Value < .5 invalid for UI.Se')
if (UI.Sp < .5) stop('Value < .5 invalid for UI.Sp')
# if (UI.Se > .6) warning('Value > .6 not recommended for UI.Se')
# if (UI.Sp > .6) warning('Value > .6 not recommended for UI.Sp')
# only one relevant intersection assumed!
# linear tests are used for determination of point of intersection
# linear test is assumed to have a normal distribution
if (is.null(intersection)) {
intersection = get.intersection(df$ref, df$test, model='kernel')
if (length(intersection) > 1) {
intersection = utils::tail(intersection, n = 1)
warning('More than one point of intersection. Highest density used. \n')
} else intersection=intersection[1] # other values are ignored
}
tt <- table(df$test, df$ref) # sort test values; colSums(tt)
pred = sort(unique(df$test))
wi0 = rev(which(pred < intersection)) # count backwards from intersection
wi1 = which(pred >= intersection) # count forwards from intersection
o0 = matrix(NA, ncol = 3, nrow = length(wi0))
colnames(o0) = c('cp.l', 'TN', 'FN')
temp = cumsum(tt[wi0, 1]) # colSums(tt[wi0[1:47],])
o0[, 'cp.l'] = pred[wi0] # as.numeric(names(temp))
o0[, 'TN'] = temp
o0[, 'FN'] = cumsum(tt[wi0, 2]) # head(o0); sprintf('%.20f', ua[1]) # include threshold
# sprintf('%.20f', ua[1]); sprintf('%.20f', pred[wi0[47]])
# ua[1] >=pred[wi0[47]]; which(pred >= ua[1]); which(abs(test-ua[1]) <= 1e-5)
# pred[289]>=ua[1] ; test[237]>=ua[1];
# sprintf('%.20f', pred[289]); sprintf('%.20f', test[237]); sprintf('%.20f', ua[1])
o1 = matrix(NA, ncol = 3, nrow = length(wi1))
colnames(o1) = c('cp.h', 'FP', 'TP')
temp = cumsum(tt[wi1, 2])
o1[, 'cp.h'] = pred[wi1] # as.numeric(names(temp))
o1[, 'TP'] = temp
o1[, 'FP'] = cumsum(tt[wi1, 1]) #head(o1)
# find rows where TN/(TN+FP) <= UI.Se
value = UI.Se / (1 - UI.Se)
# r = o0[2, "TN"]
if (select == 'limited') {
res0 = lapply(o0[, "TN"], function(r) {
a = which(r <= o1[, "FP"] * value)
ifelse(length(a) == 0, return(NA), return(a) )})
} else {
res0 = lapply(o0[, "TN"], function(r) {
a = find.closest(o1[, "FP"] * value, r)
ifelse(length(a) == 0, return(NA), return(a) )} )
}
# create matrix from list
m = t(sapply(res0, '[', 1:max(sapply(res0, length)))) # dim(m)
# find rows with at least one valid finding
rcp.l = which(rowSums(is.na(m)) != ncol(m))
cp.l = o0[rcp.l, 'cp.l'] # candidates cp.l #cp.l=o0[m, 'cp.l'] # res0[[1]]
# find candidates for cp.h
cp.h = matrix(o1[m[rcp.l, ], 'cp.h'], nrow = length(cp.l)) # length(o1[m,'cp.h']) # length(cp.h)
# df=data.frame(cp.l, cp.h)
df = data.frame(cbind(cp.l, cp.h))
df.l = melt(df, id = 'cp.l')
df.l = stats::na.omit(df.l)
if (ncol(df.l) != 3)
{ oo1 =
data.frame(
'cp.l' = NA,
'cp.h' = NA,
'FN' = NA,
'TP' = NA,
'TN' = NA,
'FP' = NA
) } else {
colnames(df.l) <- c('cp.l', 'variable', 'cp.h')
df.l = df.l[order(df.l$cp.l), c('cp.l', 'cp.h')]
m.cp.l = match(df.l$cp.l, o0[, 'cp.l'])
m.cp.h = match(df.l$cp.h, o1[, 'cp.h'])
oo1 = data.frame(
'cp.l' = df.l$cp.l,
'cp.h' = df.l$cp.h,
'FN' = o0[m.cp.l, 'FN'],
'TP' = o1[m.cp.h, 'TP'],
'TN' = o0[m.cp.l, 'TN'],
'FP' = o1[m.cp.h , 'FP']
)
oo1$UI.Se = oo1$TP / (oo1$TP + oo1$FN)
oo1$UI.Sp = oo1$TN / (oo1$FP + oo1$TN) # head(oo1)
if (select == 'limited') {
oo1 = oo1[oo1$UI.Sp <= UI.Sp &
oo1$UI.Se <= UI.Se &
stats::complete.cases(oo1), ]
} else {
oo1 = oo1[stats::complete.cases(oo1), ]
}
oo1 = oo1[!duplicated(oo1[, c('FN', 'TP', 'TN', 'FP')]), ] # nrow(o1)
}
# find rows where TP/(TP+FN) <= UI.Sp
if (select == 'limited') {
res1 = lapply(o1[, "TP"], function(r) {
a = which(r <= o0[, "FN"] * UI.Sp / (1 - UI.Sp))
ifelse(length(a) == 0, return(NA), return(a)) })
} else {
res1 = lapply(o1[, "TP"], function(r) {
a = find.closest(o0[, "FN"] * value, r)
ifelse(length(a) == 0, return(NA), return(a)) })
}
# create matrix from list
m = t(sapply(res1, '[', 1:max(sapply(res1, length))))
rcp.h = which(rowSums(is.na(m)) != ncol(m))
cp.h = o1[rcp.h, 'cp.h'] # candidates cp.h
# find candidates for cp.l
cp.l = matrix(o0[m[rcp.h, ], 'cp.l'], nrow = length(cp.h))
df = data.frame(cbind(cp.l, cp.h))
df.h = melt(df, id = 'cp.h')
df.h = stats::na.omit(df.h)
if (ncol(df.h) != 3) {
oo2 =
data.frame(
'cp.l' = NA,
'cp.h' = NA,
'FN' = NA,
'TP' = NA,
'TN' = NA,
'FP' = NA
)
} else {
colnames(df.h) <-
c('cp.h', 'variable', 'cp.l') # error! Error in `colnames<-`(`*tmp*`, value = c("cp.h", "variable", "cp.l")) :
# 'names' attribute [3] must be the same length as the vector [1]
#df.h=df.h[df.h$cp.l > .25,]
df.h = df.h[order(df.h$cp.h), c('cp.l', 'cp.h')]
# colnames(df.h) <- c('cp.l', 'cp.h')
m.cp.l = match(df.h$cp.l, o0[, 'cp.l'])
m.cp.h = match(df.h$cp.h, o1[, 'cp.h'])
oo2 = data.frame(
'cp.l' = df.h$cp.l,
'cp.h' = df.h$cp.h,
'FN' = o0[m.cp.l, 'FN'],
'TP' = o1[m.cp.h, 'TP'],
'TN' = o0[m.cp.l, 'TN'],
'FP' = o1[m.cp.h , 'FP']
)
oo2$UI.Se = oo2$TP / (oo2$TP +
oo2$FN)
oo2$UI.Sp = oo2$TN / (oo2$FP +
oo2$TN) # head(oo2)
if (select == 'limited') {
oo2 = oo2[oo2$UI.Sp <= UI.Sp &
oo2$UI.Se <= UI.Se & stats::complete.cases(oo2),]
} else {
oo2 = oo2[stats::complete.cases(oo2),]
}
oo2 = oo2[!duplicated(oo2[, c('FN', 'TP', 'TN', 'FP')]),] # nrow(o2)
}
o3 = rbind(oo1, oo2) # nrow(o3) # oo1=data.frame()
o3 = o3[stats::complete.cases(o3), ]
o3 = o3[order(-o3$cp.h, o3$cp.l),]
o3 = o3[!duplicated(o3[, c('cp.l', 'cp.h')]), ]
o3 = o3[!duplicated(o3[, c('FN', 'TP', 'TN', 'FP')]), ] # head(o3,10) # o=t(do.call(rbind, o3))
if (select == 'nearest') {
o3 = o3[find.closest(abs(o3[, 'UI.Se'] - UI.Se) +
abs(o3[, 'UI.Sp'] - UI.Sp), 0),]
}
if (negate) {
temp = o3[,1] # cp.l
o3[,1] = -o3[,2]
o3[,2] = -temp
intersection=-intersection
}
if (return.first |
nrow(o3) == 0)
return(c(unlist(o3[1, ]), intersection=intersection))
else
return(t(do.call(rbind, o3)))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.