# R/ui.ordinal.R In UncertainInterval: Uncertain Interval Methods for Three-Way Cut-Point Determination in Test Results

#### Documented in ui.ordinal

#' Function to explore possible uncertain intervals of ordinal test results of
#' individuals with (1) and without (0) the targeted condition.
#' @name ui.ordinal
#' @importFrom grDevices rgb
#' @importFrom graphics legend plot
#' @description This function is intended to be used for ordinal tests with a
#'   small number of distinct test values (for instance 20 or less). This
#'   function explores possible uncertain intervals (UI) of the test results of
#'   the two groups. This functions allows for considerable fine-tuning of the
#'   characteristics of the interval of uncertain test scores, in comparison to
#'   other functions for the determination of the uncertain interval and is
#'   intended for tests with a limited number of ordered values and/or small
#'   samples.
#'
#'   When a limited number of distinguishable scores is available,
#'   estimates will be coarse. When more than 20 values can be distinguished,
#'   a sufficiently large data set is available, the function \code{\link{RPV}}
#'   may be preferred for the analysis of discrete ordered data.
#' @param ref The reference standard. A column in a data frame or a vector
#'   indicating the classification by the reference test. The reference standard
#'   must be coded either as 0 (absence of the condition) or 1 (presence of the
#'   condition). When \code{mean(test[ref == 0]) > mean(test[ref == 1])} it is
#'   assumed that higher test scores indicate presence of the condition,
#'   otherwise that lower test scores indicate presence of the condition.
#' @param test The test or predictor under evaluation. A column in a data set or
#'   vector indicating the test results on an ordinal scale.
#' @param select.max Selects the candidate thresholds on basis of a desired
#'   property of the More Certain Intervals (MCI). The criteria are: maximum
#'   Se+Sp (default), maximum C (AUC), maximum Accuracy, maximum Sp, maximum Se,
#'   maximum size of MCI. The last alternative 'All' is to choose all possible
#'   details.
#' @param constraints Sets upper constraints for various properties of the
#'   uncertain interval: C-statistic (AUC), Acc (accuracy), lower and upper
#'   limit of the ratio of the proportions with and without the targeted
#'   condition. The default values are C = .57, Acc = .6, lower.ratio = .8,
#'   upper.ratio = 1.25. These values implement the desired uncertainty of the
#'   uncertain interval. The value of C (AUC) is considered the most important
#'   and has the most restrictive default value. For Acc and C, the values
#'   closest to the desired value are found and then all smaller values are
#'   considered. The other two constraints are straightforward lower and upper
#'   limits of the ratio between the number of patients with and without the
#'   targeted disease. If you want to change the values of these constraints, it
#'   is necessary to name all values. C = 1 or Acc = 1 excludes C respectively
#'   accuracy as selection criterion. If no solution is found, the best is
#'   showed together with  a warning message.
#' @param weights (Default = c(1, 1, 1). Vector with weights for the loss
#'   function. weights$1$ is the weight of false negatives, weights$2$ is the
#'   weight for loss in the uncertain interval (deviations from equal chances to
#'   belong to either distribution), and weights$3$ is the weight for false
#'   positives. When a weight is set to a larger value, thresholds are selected
#'   that make the corresponding error smaller while the area grows smaller.
#' @param intersection (Default = NULL). Optional value to de used as value for
#'   the intersection. If no value is supplied, the intersection is calculated
#'   using the function \code{get.intersection(ref = ref, test = test,
#'   model='ordinal'), that provides a gaussian kernel estimate of the
#'   intersection.}
#' @param return.all (Default = FALSE). When TRUE $data.table and #'$uncertain.interval are included in the output.
#' @param ... Further parameters that can be transferred to the density
#'   function.
#' @details Due to the limited possibilities of short scales, it is more
#'   difficult to determine a suitable uncertain interval when compared to
#'   longer scales. This problem is aggravated when samples are small. For any
#'   threshold determination, one needs a large representative sample (200 or
#'   larger). If there are no test scores below the intersection in the
#'   candidate uncertain area, Sp of the Uncertain Interval (UI.Sp) is not
#'   available, while UI.Se equals 1. The essential question is always whether
#'   the patients with the test scores inside the uncertain interval can be
#'   sufficiently distinguished. The candidate intervals are selected on various
#'   properties of the uncertain interval. The defaults are C (AUC) lower than
#'   .6, Acc (accuracy) lower than .6, and the ratio of proportions of persons
#'   with / without the targeted condition between .8 and 1.25. These criteria
#'   ensure that all candidates for the uncertain interval have insufficient
#'   accuracy. The second criterion is the desired property of the More Certain
#'   Intervals (see select.max parameter). The model used is 'ordinal'. This
#'   model default for the adjust parameter send to the density function is 2,
#'   but you can enter another value such as adjust = 1.
#'
#'   Dichotomous thresholds are inclusive the threshold for positive scores
#'   (patients). The count of positive scores are therefore >= threshold when
#'   the mean score for ref == 0 is lower than for ref == 1 and <= threshold
#'   when the mean score for ref == 0 is higher.
#'
#'   Both the Youden threshold and the (default used) gaussian kernel estimate
#'   of the intersection are estimates of the true intersection. In some
#'   circumstances the Youden threshold can be preferred, especially when the
#'   data show spikes for lowest and/or highest values. In many situations the
#'   gaussian kernel estimate is to be preferred, especially when there is more
#'   than one intersection.In many situations the two estimates are close to
#'   each other, but especially for coarse data they might differ.
#'
#'   Discussion of the first example (please run the code first): Visual
#'   inspection of the mixed densities function \code{\link{plotMD}} shows that
#'   distinguishing patients with and without the targeted condition is almost
#'   impossible for test scores 2, 3 and 4. Sensitivity and Specificity of the
#'   uncertain interval should be not too far from .5. In the first example, the
#'   first interval (3:3) has no lower scores than the intersection (3), and
#'   therefore UI.Sp is not available and UI.Se = 1. The UI.ratio indicates
#'   whether the number of patients with and without the condition is equal in
#'   this interval. For these 110 patients, a diagnosis of uncertainty is
#'   probably the best choice. The second interval (3:4) has an UI.Sp of .22,
#'   which is a large deviation from .5. In this slightly larger interval, the
#'   patients with a test score of 3 have a slightly larger probability to
#'   belong to the group without the condition. UI.Se is .8. UI.ratio is close
#'   to 1, which makes it a feasible candidate. The third interval (2:4) has an
#'   UI.Sp of .35 and an UI.Se of .70 and an UI.ratio still close to one. The
#'   other intervals show either Se or Sp that deviate strongly from .5, which
#'   makes them unsuitable choices. Probably the easiest way to determine the
#'   uncertain interval is the interval with minimum loss. This is interval
#'   (2:4). Dichotomization loss L2 can be defined as the sum of false negatives
#'   and false positives. The Youden threshold minimizes these. The Loss formula
#'   L3 for trichotomization of ordinal test scores is (created by
#'   https://www.codecogs.com/latex/eqneditor.php): \deqn{L_3 =\frac{ \left
#'   (\sum_{i=l}^{u} \left |d0_{i}-d1_{i}  \right | + \sum_{i=u+1}^{h} d1_{i}+
#'   \sum_{i=1}^{l-1}d0_{i}\right )}{N}}{ L3 = 1/N * (sum(abs(d0[u:l] -
#'   d1[u:l])) + sum(d1[1:(l-1)]) + sum(d0[(u+1):h]))} where \emph{d0}
#'   represents the test scores of the norm group, \emph{d1} represents the test
#'   scores of the targeted patient group, \emph{l} is the lower limit of the
#'   uncertain interval, \emph{u} the upper limit, the first test score is
#'   enumerated 1 and the last test score is enumerated \emph{h}. \emph{N} is
#'   the total number of all persons with test scores. \itemize{
#'   \item{\eqn{\sum_{i=l}^{u} \left |d0_{i}-d1_{i}  \right |}{sum(abs(d0[u:l] -
#'   d1[u:l])}}{ is the loss in the uncertain interval, that is, the total
#'   deviation from equality.} \item{\eqn{\sum_{i=u+1}^{h}
#'   d1_{i}}{sum(d1[1:(l-1)])}}{ is the loss in the lower More Certain Interval,
#'   that is, the total of False Negatives, the number of patients with the
#'   targeted condition with a test score lower than \emph{l}, and}
#'   \item{\eqn{\sum_{i=u+1}^{h} d0_{i}}{sum(d0[(u+1):h])}}{ is the loss in the
#'   upper More Certain Interval, that is, the total of False Positives, the
#'   number of patients without the targeted condition with a test score higher
#'   than \emph{u}.}}
#'
#'   Loss L is higher when the deviation from equality is higher in the
#'   uncertain area, higher when the number of False Negatives is higher, and
#'   higher when the number of False Positives is higher. The loss of a single
#'   threshold method equals 1 - its Accuracy. In this example, the minimum Loss
#'   is found with interval (2:4). As this agrees with values for UI.C and
#'   UI.ratio that sufficiently indicates the uncertainty of these test scores,
#'   this seems the most suitable choice: the number of patients with test
#'   scores 2 to 4 are almost as likely to come from either population. The
#'   remaining cases outside the uncertain interval (2:4) show high C, Accuracy,
#'   Specificity and Sensitivity.
#'
#' @return List of values:
#' \describe{
#' \item{$Youden}{A vector of statistics #' concerning the maximized Youden index:} #' \itemize{ #' \item{max.Youden: }{The #' value of the Maximized Youden Index (= max(tpr - fpr)).} #' \item{threshold: #' }{The threshold associated with the Maximized Youden Index. Test values >= #' threshold indicate the targeted condition.} #' \item{Sp: }{The Specificity of #' the test when this threshold is applied.} #' \item{Se: }{The Sensitivity of #' the test when this threshold is applied.} #' \item{Acc: }{The Accuracy of the #' test when this threshold is applied.} #' \item{Loss: }{min(fnr + fpr) = min(1 #' - (Se + Sp -1)) = 1 - max(tpr - fpr) lower range ( < threshold): the summed #' number of false positives for each test score, divided by the number of #' persons that have received that test score. upper range ( >= threshold): #' the summed number of false negatives, divided by the number of persons that #' have received that test score. The Youden Loss is equal to 1-Youden.index. #' } \ item{C: }{Concordance; equals AUROCC (Area Under Receiving Operating #' Characteristics Curve or AUC)} } \item{$data.table}{A data.frame with the
#'   following columns:} \itemize{ \item{test: }{The test scores.} \item{d0:
#'   }{The frequencies of the test scores of the norm group.} \item{d1: }{The
#'   frequencies of the test scores of the group with the targeted condition.}
#'   \item{tot: }{The total frequency of each test scores.}
#'   \item{TP: }{The
#'   number of True Positives when this test score is used as threshold.}
#'   \item{FP: }{The number of False Positives when this test score is used as
#'   threshold.}
#'   \item{tpr: }{The true positive rate when this test score is
#'   used as threshold.}
#'   \item{fpr: }{The false positive rate when this test
#'   score is used as threshold.}
#'   \item{Y: }{The Youden Index (= tpr - fpr) when
#'   this test score is used as threshold.} }
#'   \item{$intersection}{The (rounded) #' intersection for the distributions of the two groups. Most often, these #' distributions have no true point of intersection and the rounded #' intersection is an approximation. Often, this equals the Maximized Youden #' threshold (see Schisterman 2005). Warning: When a limited range of scores #' is available, it is more difficult to estimate the intersection. Different #' estimates can easily differ plus minus 1. When using a non-rounded value #' (for example 16.1), the effective threshold for the uncertain area is #' round(intersection+.5), in the mentioned example: 16.1 becomes 17. } #' \item{$uncertain.interval}{Data frame with the statistics of all possible
#'   bounds of the uncertain interval. The columns are the following: }
#'   \itemize{
#'   \item{lowerbound: }{Lower bound of the possible uncertain
#'   interval.}
#'   \item{upperbound: }{Upper bound of the possible uncertain
#'   interval.}
#'   \item{UI.Sp: }{Specificity of the test scores between and
#'   including the lower and upper boundary. Closer to .5 is 'better', that is,
#'   more uncertain. This estimate is rough and dependent on the intersection
#'   and cannot be recommended as a criterion for a short, ordinal scale. }
#'   \item{UI.Se: }{Sensitivity of the test scores between and including the
#'   lower and upper boundary. Closer to .5 is 'better', that is, more
#'   uncertain. This estimate is rough and dependent on the intersection and
#'   cannot be recommended as a criterion for a short, ordinal scale.}
#'   \item{UI.Acc: }{Accuracy of the test scores between and including the lower
#'   and upper boundary. Closer to .5 is 'better', that is, more uncertain. This
#'   estimate is rough and dependent on the intersection and cannot be
#'   recommended as a criterion for a short, ordinal scale.}
#'   \item{UI.C:
#'   }{Concordance (AUROC) of the test scores between and including the lower
#'   and upper boundary. Closer to .5 is 'better', that is, more uncertain. Rule
#'   of thumb: <= .6}
#'   \item{UI.ratio: }{The ratio between the proportion of
#'   patients in the uncertain area with and without the condition. Closer to
#'   one is 'better', that is, more uncertain; 0.8 < UI.ratio < 1.25 as a rule
#'   of fist.}
#'   \item{UI.n: }{Number of patients with test scores between and
#'   including the lower and upper boundary.}
#'   \item{MCI.Sp: }{Specificity of the
#'   more certain interval, i.e., the test scores lower than the lower boundary
#'   and higher than the upper boundary.}
#'   \item{MCI.Se: }{Sensitivity of the
#'   test scores lower than the lower boundary and higher than the upper
#'   boundary.}
#'   \item{MCI.C: }{Concordance (AUROC) of the test scores outside
#'   the uncertain interval. Closer to .5 is 'better', that is, more uncertain.
#'   Rule of thumb: <= .6}
#'   \item{MCI.Acc: }{Accuracy of the test scores lower
#'   than the lower boundary and higher than the upper boundary.} \item{MCI.n:
#'   }{Number of patients with test scores lower than the lower boundary and
#'   higher than the upper boundary.}
#'   \item{Loss: }{Loss of the
#'   trichotomization. The total loss is the sum of the loss of the three areas:
#'   lower MCI: the summed number of false positives for each test score,
#'   divided by the number of persons that have received that test score.
#'   uncertain interval: the sum of the absolute differences in the number of
#'   people in the norm group d0 and the number of persons in the group with the
#'   targeted condition (d1) per test score, divided by the total number of
#'   persons.} upper MCI: the summed number of false negatives, divided by the
#'   number of persons that have received that test score. The Loss can be
#'   compared to the loss of the Youden threshold, provided that the
#'   intersection is equal to the Youden threshold. If necessary, this can be
#'   forced by attributing the value of the Youden threshold to the intersection
#'   parameter. }
#'   \item{$candidates: }{Candidates with a loss lower than the #' Youden loss which might be considered for the Uncertain Interval. The #' candidates are selected based on the constraints parameter, that defines #' the desired constraints of the uncertain area, and the select.max #' parameter, that selects the desired properties of the lower and upper More #' Certain Interval. } } #' #' @references{ Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, #' 3(1), 32-35. #' https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3 #' #' Schisterman, E. F., Perkins, N. J., Liu, A., & Bondell, H. (2005). Optimal #' cut-point and its corresponding Youden Index to discriminate individuals #' using pooled blood samples. Epidemiology, 73-81. #' #' Landsheer, J. A. (2016). Interval of Uncertainty: An alternative approach for #' the determination of decision thresholds, with an illustrative application #' for the prediction of prostate cancer. PLOS One. #' #' Landsheer, J. A. (2018). The Clinical Relevance of Methods for Handling #' Inconclusive Medical Test Results: Quantification of Uncertainty in Medical #' Decision-Making and Screening. Diagnostics, 8(2), 32. #' https://doi.org/10.3390/diagnostics8020032 } #' @seealso{ \code{\link{plotMD}} or \code{\link{barplotMD}} for plotting the #' mixed densities of the test values. \code{\link[stats]{density}} for the #' parameters of the density function. } \code{\link{ui.nonpar}} or #' \code{\link{ui.binormal}} can be used when more than 20 values can be #' distinguished on the ordinal test scale. When a large data set for an ordinal #' test is available, one might consider \code{\link{RPV}}. #' @examples #' # A short test with 5 ordinal values #' test0 = rep(1:5, times=c(165,14,16,55, 10)) # test results norm group #' test1 = rep(1:5, times=c( 15,11,13,55,164)) # test results of patients #' ref = c(rep(0, length(test0)), rep(1, length(test1))) #' test = c(test0, test1) #' table(ref, test) #' plotMD(ref, test, model="ordinal") # visual inspection #' # In this case we may prefer the Youden estimate #' ui.ordinal(ref, test, intersection="Youden", select.max="All") #' # Same solution, but other layout of the results: #' ui.ordinal(ref, test, select.max=c("MCI.Sp+MCI.Se", "MCI.C", "MCI.Acc", #' "MCI.Se", "MCI.Sp", "MCI.n")) #' # Using a gaussian kernel estimate of the true intersection #' # gives the same best result for the uncertain interval. #' # The estimates for ui.Se, ui.Sp and ui.Acc differ for another intersection: #' ui.ordinal(ref, test, select.max="All") #' #' nobs=1000 #' set.seed(6) #' Z0 <- rnorm(nobs, mean=0) #' b0=seq(-5, 8, length.out=31) #' f0=cut(Z0, breaks = b0, labels = c(1:30)) #' x0=as.numeric(levels(f0))[f0] #' Z1 <- rnorm(nobs, mean=1, sd=1.5) #' f1=cut(Z1, breaks = b0, labels = c(1:30)) #' x1=as.numeric(levels(f1))[f1] #' ref=c(rep(0,nobs), rep(1,nobs)) #' test=c(x0,x1) #' plotMD(ref, test, model='ordinal') # looks like binormal #' # looks less binormal, but in fact it is a useful approximation: #' plotMD(ref, test, model='binormal') #' ui.ordinal(ref, test) #' ui.binormal(ref, test) # compare application of the bi-normal model #' @export # data(tostbegg2) # ref=tostbegg2$d; test=tostbegg2$y # intersection=3; weights=c(1,1,1); # constraints=c(C=.57, Acc=.6, lower.ratio=.8, upper.ratio=1.25) # select.max = c('MCI.Sp+MCI.Se', 'MCI.C', 'MCI.Acc', 'MCI.Se', 'MCI.Sp', 'MCI.n') # select.max="All" # test=-test ui.ordinal <- function(ref, test, select.max = c('MCI.Sp+MCI.Se', 'MCI.C', 'MCI.Acc', 'MCI.Se', 'MCI.Sp', 'MCI.n', 'All'), constraints=c(C=.57, Acc=.6, lower.ratio=.8, upper.ratio=1.25), weights=c(1,1,1), intersection=NULL, return.all = FALSE, ...){ find.closest <- function(M, crit){ M[which(is.na(M))] = Inf # crit+10e10 mindiff=min(abs(M-crit)) # warning: no non-missing arguments to min; returning Inf which((M == crit+mindiff) | (M == crit-mindiff), arr.ind=T) } controlshigher = mean(test[ref==0]) > mean(test[ref==1]) tlim = c(C=.57, Acc=.6, lower.ratio=.8, upper.ratio=1.25) # default if (is.null(names(constraints))) { tlim[1:length(constraints)] = constraints } else { if (all(names(constraints) %in% c("C" , "Acc" , "lower.ratio" , "upper.ratio"))){ tlim[names(constraints)] = constraints[names(constraints)]} } constraints=tlim # if (!(exists(constraints['C'] & exists(constraints['Acc'])))) stop('constraints must be named correctly.') select.max <- match.arg(select.max, c('MCI.Sp+MCI.Se', 'MCI.C', 'MCI.Acc','MCI.Se', 'MCI.Sp', 'MCI.n', 'All'), several.ok = TRUE) d=simple_roc3(test[ref==0], test[ref==1]) #head(d) totpos=length(test[ref==1]) # The total number of positives (one number) totneg=length(test[ref==0]) # The total number of negatives (one number) d$tot=rowSums(d[,c('d0','d1')])                             # Number of patients w/ each test result
# d$TN=unname(cumsum(tab[,1])) d$TN=totneg-d$FP d$FN=totpos-d$TP # d$fnr=d$FN/totpos # 1-d$FP/totneg         # 1 - specificity (false positives)
d$Y = d$tpr-d$fpr # Youden index Yt= which.max(d$Y) # Youden threshold as row number
# Hmisc::find.matches(d$Y, max(d$Y), tol=c(.0001))
# d$Y[4] > d$Y[3]
# which( d$Y == max(d$Y) )

# getrangeTP(c(1,5))
if (controlshigher) r = 1:Yt else r = Yt:nrow(d) # r = true pos; -r = true neg
Acc.Y = (sum(d$d0[-r])+sum(d$d1[r]))/ (totpos+totneg)
# Conc = emp.AUC(test[ref==0], test[ref==1])
Conc = simple_auc(test[ref==0], test[ref==1]) # norm=test[ref==0]; abnorm=test[ref==1]

lr = nrow(d)
tab = as.matrix(d[,c('d0','d1')])
p0 = totneg/(totpos+totneg); p1=totpos/(totpos+totneg) # p0=p1=.5
# r0 = p0*pt[,1]/(p0*pt[,1]+p1*pt[,2])
# R = cbind(r0, r1 = 1-r0)

# A=Yt
Loss.Y = weights[1]*sum(pt[r, 1]) +  # FP
weights[3]*sum(pt[-r, 2])   # FN

# is = Yt # is = get.intersection(ref = ref, test = test, model='ordinal')
# intersection='Youden'
if (is.null(intersection)) {
is = get.intersection(ref = ref, test = test, model='ordinal', ...)
} else if (is.numeric(intersection)) {
is=intersection
} else if (intersection=="Youden"){
is = d$test[Yt] } # closest test value isr = round(is[length(is)]) # tail has highest density wr = which(d$testscores==isr) # row number # isr=-4
drows=nrow(d)

# limits=c(1,5); getselTP(limits)
getselTP = function(limits){
selTP = logical(drows)
if (wr < limits[1] | wr > limits[2]) {NA
} else {if (controlshigher) {selTP[limits[1]:wr] = TRUE
} else {selTP[wr:limits[2]] = TRUE} }
selTP
}
upperbound = c(wr:drows) # row numbers, not test scores!
lowerbound = c(wr:1)
ui= as.data.frame(expand.grid(lowerbound,upperbound))
colnames(ui) <- c('lowerbound', 'upperbound')
ui.grid = ui[,1:2]

# x = c(3,3) # two values
fUI.Sp = function (x) {
sel = logical(drows); sel[x[1]:x[2]] = TRUE
selSp = !getselTP(x) & sel
sum(tab[selSp, 1]) / sum(tab[x[1]:x[2], 1])
}
fUI.Se = function (x){
selSe = getselTP(x);
sum(tab[selSe,2])/sum(tab[x[1]:x[2],2])
}
fUI.Acc = function(x) {
sel = logical(drows); sel[x[1]:x[2]] = TRUE
selTP = getselTP(x)
(sum(tab[!selTP & sel,1])+sum(tab[selTP,2]))/
sum(tab[(x[1]:x[2]),c(1,2)])
}
fUI.ratio = function (x){(sum(tab[x[1]:x[2],2])/totpos)/(sum(tab[x[1]:x[2],1])/totneg)}
# fUI.C = function (x){sel = test>=d$test[x[1]] & test<=d$test[x[2]] # cbind(test,sel)
#                         emp.AUC(test[sel & ref==0], test[sel & ref==1])} # ref=ref[sel]; test=test[sel]
fUI.C2 = function (x){x2 = sort(c(d$test[x[1]], d$test[x[2]]))
sel = (test>=x2[1] & test<=x2[2]) # cbind(test,sel)
simple_auc(test[sel & ref==0], test[sel & ref==1])} # ref=ref[sel]; test=test[sel]
f.UI.n = function(x){ sum(d$tot[x[1]:x[2]]) } ui$UI.Sp = apply(ui.grid, 1, fUI.Sp) # x = unlist(ui[1,]); fUI.Sp(x)
ui$UI.Se = apply(ui.grid, 1, fUI.Se) ui$UI.Acc = apply(ui.grid, 1, fUI.Acc)
ui$UI.ratio = apply(ui.grid, 1, fUI.ratio) # x=c(3,3); fUI.ratio(x) # ui$UI.C2 = apply(ui.grid, 1, fUI.C)
ui$UI.C = apply(ui.grid, 1, fUI.C2) # x = unlist(ui.grid[2,]) ui$UI.n = apply(ui.grid, 1, f.UI.n)

f.Sp = function(x) {
if (controlshigher) r=-(1:x[2]) else r = -(x[1]:drows)
sum(d$d0[r])/sum(d$d0[-(x[1]:x[2])]) }
f.Se = function(x) {
if (controlshigher) r = -(x[1]:drows) else r = -(1:x[2])
sum(d$d1[r])/sum(d$d1[-(x[1]:x[2])]) }
f.Acc = function(x) {
tot = sum(d$d0[-(x[1]:x[2])]+d$d1[-(x[1]:x[2])])
if (controlshigher){
(sum(d$d0[-(1:x[2])])+sum(d$d1[-(x[1]:drows)]))/tot
} else {
(sum(d$d0[-(x[1]:drows)])+sum(d$d1[-(1:x[2])]))/tot
}
}
# f.C = function (x){sel = test>=d$test[x[1]] & test<=d$test[x[2]] # cbind(test,sel)
#    emp.AUC(test[!sel & ref==0], test[!sel & ref==1])} # ref=ref[sel]; test=test[sel]
f.C2 = function (x){
x2 = sort(c(d$test[x[1]], d$test[x[2]]))
sel = test>= x2[1] & test<=x2[2] # cbind(test,sel)
simple_auc(test[!sel & ref==0], test[!sel & ref==1])
} # ref=ref[sel]; test=test[sel]

f.Loss = function(x) {
if (controlshigher) {
weights[1]*sum(pt[-(x[1]:lr), 1]) + # FPR
weights[2]*sum(abs(pt[x[1]:x[2], 2] - pt[x[1]:x[2], 1])) + # errors for the uncertain interval
weights[3]*sum(pt[-(1:x[2]), 2])  # FNR
} else {
weights[1]*sum(pt[-(x[1]:lr), 2]) + # FNR
weights[2]*sum(abs(pt[x[1]:x[2], 1] - pt[x[1]:x[2], 2])) + # errors for the uncertain interval
weights[3]*sum(pt[-(1:x[2]), 1])  # FPR
}
}
f.MCI.n = function(x){ sum(d$tot[-(x[1]:x[2])]) } # x=c(4,4); f.Loss(x) ui$MCI.Sp = apply(ui.grid, 1, f.Sp)
ui$MCI.Se = apply(ui.grid, 1, f.Se) # x=c(1,3); f.Se(x) ui$MCI.Acc = apply(ui.grid, 1, f.Acc)
# ui$MCI.C = apply(ui.grid, 1, f.C) ui$MCI.C = apply(ui.grid, 1, f.C2)
ui$MCI.n = apply(ui.grid, 1, f.MCI.n) ui$Loss = apply(ui.grid, 1, f.Loss)

# ui$Random.loss= Acc.Y - ((round(ui$MCI.Acc*ui$MCI.n)+ # p1*ui$UI.n)/ (totpos+totneg))

# replace row numbers by test values
ui$lowerbound = d$test[ui$lowerbound] ui$upperbound =  d$test[ui$upperbound]

# sel = ui$Loss <= Loss.Y # ui2 = ui[sel,] # ui2 = ui2[order(ui2$Loss),]
C.lim = ui$UI.C[find.closest(ui$UI.C, constraints['C'])][1]
Acc.lim = ui$UI.Acc[find.closest(ui$UI.Acc, constraints['Acc'])][1]

# if (controlshigher){
#   d$test = -d$test
#   isr = -isr
#   ui[,'lowerbound']= -ui[,'lowerbound']
#   ui[,'upperbound']= -ui[,'upperbound']
#   colnames(ui)[1:2] = c('upperbound', 'lowerbound')
# }

sel = ui$UI.C <= C.lim & ui$UI.Acc <= Acc.lim
ui2 = ui[sel,]

sel = ui2$UI.ratio >= constraints['lower.ratio'] & ui2$UI.ratio <= constraints['upper.ratio']

if (sum(sel) >= 1) {ui2 = ui2[sel,]} else warning('No solution found for the ui ratio constraints. \n')
# select at least one

if ('All' %in% select.max){
ui3 = ui2[order(ui2$Loss),] } else { if ('MCI.Sp+MCI.Se' %in% select.max){ sel = ui$Loss <= Loss.Y
if (sum(sel) == 0) warning('No solution found with less loss than Youden threshold. \n')
# select at least one

ui3 = ui2[which.max(unlist(ui2['MCI.Se']+ui2['MCI.Sp'])),]
rownames(ui3) <- c('MCI.Sp+MCI.Se')
select.max = select.max[!select.max %in% 'MCI.Sp+MCI.Se']
} else { ui3 = ui2[FALSE,] } # empty data.frame
i=1
while (select.max[i] %in% c('MCI.C', 'MCI.Se', 'MCI.Sp', 'MCI.Acc', 'MCI.n'))
{
temp = nrow(ui3)
ui3 = rbind(ui3,ui2[which.max(unlist(ui2[select.max[i]])),])
rownames(ui3)[temp+1] <- select.max[i]
i=i+1
}
}

if (return.all) { return(
list(#direction=direction,
Youden=c(max.Youden=d$Y[Yt], # max Youden index threshold=d$test[Yt], # cut-point test values
Sp=1-d$fpr[Yt], Se=d$tpr[Yt],Acc=Acc.Y, Loss=Loss.Y, C = Conc),
data.table = d,
intersection=isr,
uncertain.interval=ui,
candidates=ui3) )
} else  { return(
list(#direction=direction,
Youden=c(max.Youden=d$Y[Yt], # max Youden index threshold=d$test[Yt], # cut-point test values
sp=1-d$fpr[Yt], se=d$tpr[Yt],acc=Acc.Y, loss=Loss.Y, C = Conc),
intersection=isr,
candidates=ui3)
)
}

}


## Try the UncertainInterval package in your browser

Any scripts or data that you put into this service are public.

UncertainInterval documentation built on March 3, 2021, 1:10 a.m.