knitr::opts_chunk$set(fig.width = 6, fig.height = 5, fig.align = "center") options(rmarkdown.html_vignette.check_title = FALSE) load("vignettedata/vignettedata.Rdata")
cutpointr is an R package for tidy calculation of "optimal" cutpoints. It supports several methods for calculating cutpoints and includes several metrics that can be maximized or minimized by selecting a cutpoint. Some of these methods are designed to be more robust than the simple empirical optimization of a metric. Additionally, cutpointr can automatically bootstrap the variability of the optimal cutpoints and return out-of-bag estimates of various performance metrics.
You can install cutpointr from CRAN using the menu in RStudio or simply:
install.packages("cutpointr")
For example, the optimal cutpoint for the included data set is 2 when maximizing the sum of sensitivity and specificity.
library(cutpointr) data(suicide) head(suicide) cp <- cutpointr(suicide, dsi, suicide, method = maximize_metric, metric = sum_sens_spec)
summary(cp)
plot(cp)
When considering the optimality of a cutpoint, we can only make a judgement based on the sample at hand. Thus, the estimated cutpoint may not be optimal within the population or on unseen data, which is why we sometimes put the "optimal" in quotation marks.
cutpointr
makes assumptions about the direction of the dependency between
class
and x
, if direction
and / or pos_class
or neg_class
are not
specified. The same result as above can be achieved by manually defining direction
and
the positive / negative classes which is slightly faster, since the classes and direction
don't have to be determined:
opt_cut <- cutpointr(suicide, dsi, suicide, direction = ">=", pos_class = "yes", neg_class = "no", method = maximize_metric, metric = youden)
opt_cut
is a data frame that returns the input data and the ROC curve
(and optionally the bootstrap results) in a
nested tibble. Methods for summarizing and plotting the data and
results are included (e.g. summary
, plot
, plot_roc
, plot_metric
)
To inspect the optimization, the function of metric values per cutpoint can be
plotted using plot_metric
, if an optimization function was used that returns
a metric column in the roc_curve
column. For example, the maximize_metric
and minimize_metric
functions do so:
plot_metric(opt_cut)
Predictions for new data can be made using predict
:
predict(opt_cut, newdata = data.frame(dsi = 0:5))
The included methods for calculating cutpoints are:
maximize_metric
: Maximize the metric functionminimize_metric
: Minimize the metric functionmaximize_loess_metric
: Maximize the metric function after LOESS smoothingminimize_loess_metric
: Minimize the metric function after LOESS smoothingmaximize_spline_metric
: Maximize the metric function after spline smoothingminimize_spline_metric
: Minimize the metric function after spline smoothingmaximize_gam_metric
: Maximize the metric function after smoothing via Generalized Additive Modelsminimize_gam_metric
: Minimize the metric function after smoothing via Generalized Additive Modelsmaximize_boot_metric
: Bootstrap the optimal cutpoint when maximizing a metricminimize_boot_metric
: Bootstrap the optimal cutpoint when minimizing a metricoc_manual
: Specify the cutoff value manuallyoc_mean
: Use the sample mean as the "optimal" cutpointoc_median
: Use the sample median as the "optimal" cutpointoc_youden_kernel
: Maximize the Youden-Index after kernel smoothing
the distributions of the two classesoc_youden_normal
: Maximize the Youden-Index parametrically
assuming normally distributed data in both classesThe included metrics to be used with the minimization and maximization methods are:
accuracy
: Fraction correctly classifiedabs_d_sens_spec
: The absolute difference of sensitivity and specificityabs_d_ppv_npv
: The absolute difference between positive predictive
value (PPV) and negative predictive value (NPV)roc01
: Distance to the point (0,1) on ROC spacecohens_kappa
: Cohen's Kappasum_sens_spec
: sensitivity + specificitysum_ppv_npv
: The sum of positive predictive value (PPV) and negative
predictive value (NPV)prod_sens_spec
: sensitivity * specificityprod_ppv_npv
: The product of positive predictive value (PPV) and
negative predictive value (NPV)youden
: Youden- or J-Index = sensitivity + specificity - 1odds_ratio
: (Diagnostic) odds ratiorisk_ratio
: risk ratio (relative risk)p_chisquared
: The p-value of a chi-squared test on the confusion
matrixcost_misclassification
: The sum of the misclassification cost of
false positives and false negatives. Additional arguments: cost_fp, cost_fntotal_utility
: The total utility of true / false positives / negatives.
Additional arguments: utility_tp, utility_tn, cost_fp, cost_fnF1_score
: The F1-score (2 * TP) / (2 * TP + FP + FN)metric_constrain
: Maximize a selected metric given a minimal value of
another selected metricsens_constrain
: Maximize sensitivity given a minimal value of specificityspec_constrain
: Maximize specificity given a minimal value of sensitivityacc_constrain
: Maximize accuracy given a minimal value of sensitivityFurthermore, the following functions are included which can be used as metric
functions but are more useful for plotting purposes, for example in
plot_cutpointr
, or for defining new metric functions:
tp
, fp
, tn
, fn
, tpr
, fpr
, tnr
, fnr
, false_omission_rate
,
false_discovery_rate
, ppv
, npv
, precision
, recall
, sensitivity
, and specificity
.
The inputs to the arguments
method
and metric
are functions so that user-defined functions can easily
be supplied instead of the built-in ones.
Cutpoints can be separately estimated on subgroups that are defined by a third variable,
gender
in this case. Additionally,
if boot_runs
is larger zero, cutpointr
will carry out the usual cutpoint
calculation on the full sample, just as before, and additionally on
boot_runs
bootstrap samples. This offers a way of gauging the out-of-sample
performance of the cutpoint estimation method. If a subgroup is given,
the bootstrapping is carried out separately for every
subgroup which is also reflected in the plots and output.
set.seed(12) opt_cut_b <- cutpointr(suicide, dsi, suicide, boot_runs = 1000)
opt_cut_b
The returned object has the additional column boot
which is a nested tibble that
includes the cutpoints per bootstrap sample along with the metric calculated using
the function in metric
and
various default metrics. The
metrics are suffixed by _b
to indicate in-bag results or _oob
to indicate
out-of-bag results:
opt_cut_b$boot
The summary and plots include additional elements that summarize or display the bootstrap results:
summary(opt_cut_b) plot(opt_cut_b)
Using foreach
and doRNG
the bootstrapping can be parallelized easily. The
doRNG package is being used to make the bootstrap sampling reproducible.
library(doParallel) cl <- makeCluster(2) # 2 cores registerDoParallel(cl) registerDoRNG(12) # Reproducible parallel loops using doRNG opt_cut <- cutpointr(suicide, dsi, suicide, gender, pos_class = "yes", direction = ">=", boot_runs = 1000, allowParallel = TRUE) stopCluster(cl) opt_cut
By default, most packages only return the "best" cutpoint and disregard other cutpoints with quite similar performance, even if the performance differences are minuscule. cutpointr makes this process more explicit via the tol_metric
argument. For example, if all cutpoints are of interest that achieve at least an accuracy within 0.05
of the optimally achievable accuracy, tol_metric
can be set to 0.05
and also those cutpoints will be returned.
In the case of the suicide
data and when maximizing the sum of sensitivity and specificity, empirically the cutpoints 2 and 3 lead to quite similar performances. If tol_metric
is set to 0.05
, both will be returned.
library(tidyr) library(dplyr) opt_cut <- cutpointr(suicide, dsi, suicide, metric = sum_sens_spec, tol_metric = 0.05, break_ties = c) opt_cut |> select(optimal_cutpoint, sum_sens_spec) |> unnest(cols = c(optimal_cutpoint, sum_sens_spec))
Using the oc_manual
function the optimal cutpoint will not be determined
based on, for example, a metric but is instead set manually using the
cutpoint
argument. This is useful for supplying and evaluating cutpoints that were found
in the literature or in other external sources.
The oc_manual
function could also be used to set the cutpoint to the sample
mean using cutpoint = mean(data$x)
. However, this may introduce bias into the
bootstrap validation procedure, since the actual mean of the population is
not known and thus the mean to be used as the cutpoint should be automatically determined in every resample.
To do so, the oc_mean
and oc_median
functions can be used.
set.seed(100) opt_cut_manual <- cutpointr(suicide, dsi, suicide, method = oc_manual, cutpoint = mean(suicide$dsi), boot_runs = 1000) set.seed(100) opt_cut_mean <- cutpointr(suicide, dsi, suicide, method = oc_mean, boot_runs = 1000)
The arguments to cutpointr
do not need to be enclosed in quotes. This is
possible thanks to nonstandard evaluation of the arguments, which are
evaluated on data
.
Functions that use nonstandard evaluation are often not suitable for
programming with. The use of nonstandard evaluation may lead to scoping
problems and subsequent obvious as well as possibly subtle errors.
cutpointr uses tidyeval internally and accordingly the same rules as
for programming with dplyr
apply. Arguments can be unquoted with !!
:
myvar <- "dsi" cutpointr(suicide, !!myvar, suicide)
So far - which is the default in cutpointr
- we have considered all unique values of the predictor as possible cutpoints. An alternative could be to use a sequence of equidistant values instead, for example in the case of the suicide
data all integers in $[0, 10]$. However, with very sparse data and small intervals between the candidate cutpoints (i.e. a 'dense' sequence like seq(0, 10, by = 0.01)
) this leads to the uninformative evaluation of large ranges of cutpoints that all result in the same metric value. A more elegant alternative, not only for the case of sparse data, that is supported by cutpointr is the use of a mean value of the optimal cutpoint and the next highest (if direction = ">="
) or the next lowest (if direction = "<="
) predictor value in the data. The result is an optimal cutpoint that is equal to the cutpoint that would be obtained using an infinitely dense sequence of candidate cutpoints and is thus usually more efficient computationally.
This behavior can be activated by setting use_midpoints = TRUE
, which is the default. If we use this setting, we obtain an optimal cutpoint of 1.5 for the complete sample on the suicide
data instead of 2 when maximizing the sum of sensitivity and specificity.
Assume the following small data set:
dat <- data.frame(outcome = c("neg", "neg", "neg", "pos", "pos", "pos", "pos"), pred = c(1, 2, 3, 8, 11, 11, 12))
Since the distance of the optimal cutpoint (8) to the next lowest
observation (3) is rather large we arrive at a range of possible cutpoints that
all maximize the metric. In the case of this kind of sparseness it might for example be
desirable to classify a new observation with a predictor value of 4 as belonging
to the negative class. If use_midpoints
is set to TRUE
, the mean of the
optimal cutpoint and the next lowest observation is returned as the optimal
cutpoint, if direction is >=
. The mean of the optimal cutpoint and the next
highest observation is returned as the optimal cutpoint, if direction = "<="
.
opt_cut <- cutpointr(dat, x = pred, class = outcome, use_midpoints = TRUE) plot_x(opt_cut)
A simulation demonstrates more clearly that setting use_midpoints = TRUE
avoids
biasing the cutpoints. To simulate the bias of the metric functions, the
predictor values of both classes were drawn from normal distributions with
constant standard deviations of 10, a constant mean of the negative class of 100
and higher mean values of the positive class that are selected in such a way
that optimal Youden-Index values of 0.2, 0.4, 0.6, and 0.8 result in the population.
Samples of 9 different sizes were drawn and the cutpoints that maximize the
Youden-Index were estimated. The simulation was repeated 10000 times. As can be
seen by the mean error, use_midpoints = TRUE
eliminates the bias that is
introduced by otherwise selecting the value of an observation as the optimal
cutpoint. If direction = ">="
, as in this case, the observation that
represents the optimal cutpoint is the highest possible cutpoint that leads to the
optimal metric value and thus the biases are positive. The methods oc_youden_normal
and oc_youden_kernel
are always unbiased, as they don't select a cutpoint
based on the ROC-curve or the function of metric values per cutpoint.
plotdat_nomidpoints <- structure(list(sim_nr = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 32L, 32L, 32L, 32L, 32L, 32L, 32L, 32L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 34L, 35L, 35L, 35L, 35L, 35L, 35L, 35L, 35L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L), method = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), .Label = c("emp", "normal", "loess", "boot", "spline", "spline_20", "kernel", "gam" ), class = "factor"), n = c(30, 30, 30, 30, 30, 30, 30, 30, 50, 50, 50, 50, 50, 50, 50, 50, 75, 75, 75, 75, 75, 75, 75, 75, 100, 100, 100, 100, 100, 100, 100, 100, 150, 150, 150, 150, 150, 150, 150, 150, 250, 250, 250, 250, 250, 250, 250, 250, 500, 500, 500, 500, 500, 500, 500, 500, 750, 750, 750, 750, 750, 750, 750, 750, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 30, 30, 30, 30, 30, 30, 30, 30, 50, 50, 50, 50, 50, 50, 50, 50, 75, 75, 75, 75, 75, 75, 75, 75, 100, 100, 100, 100, 100, 100, 100, 100, 150, 150, 150, 150, 150, 150, 150, 150, 250, 250, 250, 250, 250, 250, 250, 250, 500, 500, 500, 500, 500, 500, 500, 500, 750, 750, 750, 750, 750, 750, 750, 750, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 30, 30, 30, 30, 30, 30, 30, 30, 50, 50, 50, 50, 50, 50, 50, 50, 75, 75, 75, 75, 75, 75, 75, 75, 100, 100, 100, 100, 100, 100, 100, 100, 150, 150, 150, 150, 150, 150, 150, 150, 250, 250, 250, 250, 250, 250, 250, 250, 500, 500, 500, 500, 500, 500, 500, 500, 750, 750, 750, 750, 750, 750, 750, 750, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 30, 30, 30, 30, 30, 30, 30, 30, 50, 50, 50, 50, 50, 50, 50, 50, 75, 75, 75, 75, 75, 75, 75, 75, 100, 100, 100, 100, 100, 100, 100, 100, 150, 150, 150, 150, 150, 150, 150, 150, 250, 250, 250, 250, 250, 250, 250, 250, 500, 500, 500, 500, 500, 500, 500, 500, 750, 750, 750, 750, 750, 750, 750, 750, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000), mean_err = c(0.532157164015659, 0.0344907054484091, 1.09430750651166, 0.847845162156675, 1.72337372126503, 0.893756658507988, 0.0430309247027736, 0.785821459035346, 0.368063404388512, 0.0256197760404459, 0.54480529648463, 0.54385597929651, 0.657325657699579, 0.578611116865437, 0.0400491342691897, 0.515688005217413, 0.256713912589642, 0.0444582875885996, 0.326975493112402, 0.371128780921122, 0.473515115741104, 0.389519558405289, 0.105044360789378, 0.301924717299333, 0.207750921776918, -0.00318128936770314, 0.215170156089776, 0.27218780048926, 0.260519564021842, 0.236792923882582, 0.0209319074923902, 0.232957055204834, 0.0726605917614469, -0.00282823355849125, 0.0753216783313991, 0.147121931849656, 0.0986417724955371, 0.10048009778446, -0.0117861260923649, 0.0650845904350442, 0.0985144485083747, 0.00601003227249322, 0.107439979908118, 0.120777421732797, 0.098470489820427, 0.0940946984227826, 0.0340166854141625, 0.107851118082414, 0.0249685210781582, -0.00275219600614378, 0.0258069390207201, 0.0303381972274654, -0.000994602151869198, 0.00196854833683764, -0.0172319489562159, 0.0230957871932473, 0.00787486424680835, -0.018438041997315, -0.000808033567394628, -0.00151904153864496, -0.0258118523805697, -0.020984156892953, -0.0411584927473141, -0.0462075435919094, 0.0481149217843661, -0.0115241085997692, 0.0278045708419731, 0.0358588316426625, 0.0424473909450939, 0.0379773233328197, 0.0298772985879321, 0.0494939492036379, 0.561286668337778, 0.0210874502384648, 0.607711822769155, 0.944733906256477, 1.32069801051061, 0.623604782428556, 0.0138075109769806, 0.640859854412358, 0.284873604303057, -0.0170357985365701, 0.273426417633118, 0.524432737895336, 0.355003110979807, 0.312837607951434, -0.0316296929553873, 0.270109834098986, 0.174110335581819, 0.0253101719615279, 0.199956222742702, 0.375485416120771, 0.278956745944806, 0.244245525945888, 0.0325126314233263, 0.253352659868514, 0.154840760004461, 0.00231589709639472, 0.154412480165179, 0.264847742842386, 0.196572744185608, 0.182934783774992, 0.0207139021497755, 0.17351041376412, 0.14190910348156, -0.000766834484010096, 0.159975205214477, 0.191222128926019, 0.119768252112669, 0.12033372914036, -0.00429047209392067, 0.120982527821078, 0.0756304869484406, -0.00890884219048113, 0.0727782693168392, 0.118690444738942, 0.0814898789647033, 0.0799724348001957, 0.0182926240912726, 0.0887155007804252, 0.00799604720502299, 0.000599148388616836, -0.00567769035990384, 0.0358412514670032, 0.0308474979074875, 0.0341668723768997, -0.000180318451026095, 0.0180733341290925, 0.00456876236626807, 0.00150574966485876, 0.011152095953916, 0.0176039119729626, 0.00608274255434991, 0.0146257828313115, -0.0108877417404102, 0.00341198000323035, -0.00198459880370283, 0.0026551895445694, 0.00199040664534129, 0.0150165794544221, 0.00646287144368147, 0.00999205240904708, -0.00850278571195971, 0.000833666619266177, 0.714730067273087, -0.00916546079360956, 0.662799490366986, 1.18552468844156, 1.25901933062308, 0.672701515532179, 0.0311066140197676, 0.699068058809396, 0.451908043813962, 0.0131716226592205, 0.429551369887738, 0.697928133235757, 0.445367768988423, 0.408463448982185, 0.0318707241721211, 0.406284953982951, 0.257736307754364, -0.00525924423719458, 0.236977000055322, 0.429144726596141, 0.291381752107184, 0.267557606428613, 0.0103657879176852, 0.254728590646094, 0.187783398487578, -0.00216064381479362, 0.209025103860707, 0.318293592390017, 0.216751610346408, 0.195630579126633, -0.00355723971644246, 0.174111826408428, 0.151010324964235, 0.0152409223899092, 0.159002511320467, 0.214643583389694, 0.136211731513269, 0.138948149207635, 0.00736196817594524, 0.115637867729083, 0.0491348055596302, -0.00133957946235943, 0.0507437758212659, 0.103956325245849, 0.0641182216839426, 0.0721933081297794, -0.0124376134651938, 0.0632317888879588, 0.0322195712438111, 0.00170122889182022, 0.0287526766624194, 0.0589662164030242, 0.0348535721709848, 0.039527944642463, -0.00617539706415593, 0.0274246010641889, 0.0325877909680824, 0.00530528253248245, 0.0221776555499961, 0.0389702052631117, 0.0221602091288215, 0.0254478639695596, -0.0016189234058987, 0.0197144417326668, -0.00632485262604172, -0.00364979854195596, -0.00276076468388984, 0.0126267527301874, 0.0123592498266038, 0.0154921632247644, -0.00591512196680815, 0.0098685016547149, 1.19276916750486, -0.0296831640401583, 0.99406393593888, 1.95758669445116, 1.45842790978446, 0.916899913902239, -0.0240222410217233, 1.00771193034927, 0.748151091428865, 0.001671855025917, 0.665180535306263, 1.1777049634557, 0.578603609273264, 0.546625362141714, 0.0292152981607387, 0.615230814912951, 0.417886753756131, 0.00324593885807739, 0.406076310942717, 0.732191741449251, 0.352684769616612, 0.326901376027897, -0.000759357576337989, 0.350075431324921, 0.310927617707656, -0.0107255472998434, 0.28102101085112, 0.514683023356017, 0.24913510139508, 0.235155452507568, -0.0220885572014814, 0.243370611433649, 0.209652330609093, -0.00502865663759991, 0.2172246261346, 0.356540958804122, 0.172121720418057, 0.17487914828986, 0.00365942442127361, 0.176594681455494, 0.126956927057327, -0.00270525933073803, 0.120116234221594, 0.210827536708082, 0.101520409193932, 0.101379097920023, 0.00238043252144371, 0.113027315928011, 0.0598624378953727, -0.00538838415690431, 0.0568400730102315, 0.0978115258288965, 0.0454207684906316, 0.0473140143579152, -0.00165813015281622, 0.0521772135812508, 0.0530224090961669, -0.000416993405198653, 0.0353236911458531, 0.0605493601241619, 0.0316204159297213, 0.0344789374555544, -0.00446984887315054, 0.0328807595695966, 0.0396438546423947, -0.00331466369719113, 0.0379029847219126, 0.0572435100638761, 0.0253269328104989, 0.0235663211070417, 0.00220241478536399, 0.0307132312422208), youden = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8 )), row.names = c(NA, -288L), group_sizes = c(8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L ), biggest_group_size = 8L, class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(sim_nr = 1:36, .rows = list(1:8, 9:16, 17:24, 25:32, 33:40, 41:48, 49:56, 57:64, 65:72, 73:80, 81:88, 89:96, 97:104, 105:112, 113:120, 121:128, 129:136, 137:144, 145:152, 153:160, 161:168, 169:176, 177:184, 185:192, 193:200, 201:208, 209:216, 217:224, 225:232, 233:240, 241:248, 249:256, 257:264, 265:272, 273:280, 281:288)), row.names = c(NA, -36L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
library(ggplot2) ggplot(plotdat_nomidpoints %>% filter(!(method %in% c("spline_20"))), aes(x = n, y = mean_err, color = method, shape = method)) + geom_line() + geom_point() + facet_wrap(~ youden, scales = "fixed") + scale_shape_manual(values = 1:nlevels(plotdat_nomidpoints$method)) + scale_x_log10(breaks = c(30, 50, 75, 100, 150, 250, 500, 750, 1000)) + ggtitle("Bias of all methods when use_midpoints = FALSE", "normally distributed data, 10000 repetitions of simulation")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.