knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(screenr) data(unicorns) data(val_data)
The screenr package eases the development and validation of pre-testing screening tools of the sort reviewed by Clemens et al [1], but is applicable to any test-screening problem. Universal testing for a condition can be impractical if the test procedure is relatively expensive and the condition is rare. Alternatively, testing only those subjects having a sufficiently large probability of testing positive may be viable if a screening procedure can be developed which has an acceptable sensitivity. That may be possible if easily measured/recorded risk factors for test positivity can be identified.
This vignette demonstrates the development and validation of such a screening tool using lasso-like L1 regularization [2] of logistic models, as implemented in the glmpath package [3], and also model fitting using maximum-likelihood estimation. Regularization enables robust model selection, but the parameter estimates are biased toward zero. Therefore one might choose to re-fit the model selected by regularization using maximum-likelihood estimation. Model performance is evaluated using receiver-operating characteristics (ROC) [4,5].
Unicorns suffer an epidemic of infection by the Unicorn Immunodeficiency Virus (UIV). There is currently no cure for UIV infection, but highly effective antiretrovial therapy has the potential to block forward transmission and to avert death from opportunistic infections associated with UIV infection. Therefore it is critical that UIV infection is detected, and that infected unicorns are enrolled in treatment. However, UIV infection is sufficiently rare that universal testing of unicorns at health-care entry points is impractical.
A sample of r dim(unicorns)[1]
properly consented adult unicorns were
enrolled in a study aimed at evidence-based targeting of testing for
UIV infection. The subjects were administered a
questionnaire identifying the presence or absence of putative risk
factors for UIV infection. The prevalence of UIV is low, and
therefore universal testing was deemed impractical. The challenge
then is to identify unicorns who should be prioritized for
testing. Because UIV is transmissible and fatal if left untreated, it is
important that the screening tool have an acceptably high sensitivity.
The screenr package enables development and validation of such
screening tools.
The screening questionnaire included seven questions which were selected based on published information on infection risk. The data consists of the responses to the screening questions Q1, ..., Q7 (coded as 1 for an affirmative response and 0 for a negative response), and the testing outcome (testresult), again coded 0 and 1 for negative and positive results, respectively.
NOTE: It is critically important that all of the putative risk factors have a positive association with the outcome. That is, the questionnaire must be designed so that affirmative (yes) responses are hypothesized to indicate an increased risk of UIV infection.
The data from the unicorn study look like:
## The first six lines of the unicorn screening data: head(unicorns)
The prevalence of UIV in the sample of unicorns is
r mean(unicorns$testresult, na.rm = TRUE)
. Under universal testing
at health-care entry points, discovery of a single
infection would require, on average, testing
r round(1 / mean(unicorns$testresult, na.rm = TRUE))
unicorns.
This demonstration of screening-tool development and validation consists of five steps:
In practice, unfortunately, the fourth step is sometimes omitted due to resource limitations.
The first step is the selection of a parsimonious logistic screening
model. A method known as L1 regularization has desirable
properties for that task [2,3]. The function lasso_screenr()
provides easy, integrated access to the L1 regularization algorithm
of Park and Hastie [3], as implemented in the glmpath R
package, using
a convenient formula interface:
uniobj1 <- lasso_screenr(testresult ~ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q7, data = unicorns, Nfolds = 10, seed = 123)
The formula testresult ~ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q7
is an
expression of the statement "predict testresult
from the seven
covariates Q1
, ..., Q7
. The argument Nfolds = 10
specifies the desired number of partitions (folds) to be used for
cross validation (discussed under 'Selection of a screening
threshold', below). The optional seed
argument specifies the starting value
for the random-number generator used for cross-validation data-splitting. Here
it is set to a fixed value to insure reproducibility. By default, the
seed is set by the system clock, and therefore results will vary from
run to run. Set the seed only if reproducible results are needed.
The fitting algorithm computes logit-scale coefficients along the entire regularization path, and has the effect of selecting covariates based on penalization by the L1-norm of their coefficients, similar to the ordinary lasso [2]. Unlike the ordinary lasso, the algorithm also imposes a very small penalty on the L2-norm, which eliminates adverse effects of any strong correlations among the predictors and provides useful estimates even if the response variable is separable by the predictors (albeit unlikely in test-screening applications).
IMPORTANT: The object returned by lasso_screenr
(or any of the other
"_screenr" functions) contains information critical to validation and
implementation of test screening. Therefore the object should be saved
for future use, for example:
saveRDS(uniobj1, file = "uniobj1.rds")
The resulting lasso_screenr
-class object (uniobj1
) includes the
results from regularization and performance measures from the fits to
all of the data using k-fold cross validation. The resulting
object, uniobj1
, contains all of the information need to develop and
(internally) validate a screening tool for diagnostic tests. The
screenr package provides the following methods for
lasso-screenr
-class objects:
methods(class = "lasso_screenr")
The methods should make it mostly unnecessary to access components of
lasso_screenr
-class objects directly using R
code similar to
object$component
.
For example, the summary
method method provides basic information
about the results of regularization:
summary(uniobj1)
Regularization yielded logistic eight models. Two models along the
regularization path which yield the smallest AIC and
BIC values,
denoted "minAIC"
and "minBIC"
, respectively, are given special
status. Those are likely the most useful solutions in nearly all
settings. The default for pAUC is the area
under the ROC curve for which sensitivity is at least 0.8, but see the
results of help(lasso_screenr)
for other options. However all
solutions are accessible from lasso_screer
-class objects.
The logit-scale coefficients for the two special models are obtained using:
coef(uniobj1)
Note that the coefficients for Q4 have been shrunk to zero in both the AIC- and BIC-best models, which happen to coincide in the unicorn data. In effect, Q4 has been eliminated from both of those models. Q4 was not an important predictor of the test result. One can also obtain the adjusted odds ratios and/or omit the intercept. For example, the adjusted odds ratios for the seven predictors can be obtained using:
coef(uniobj1, or = TRUE, intercept = FALSE)
The adjusted odds ratios for Q4 are 1.0, again indicating that Q4 was not predictive of the test result. One can also examine the coefficients everywhere the active set changed along the regularization path for either of the special fits by extracting and plotting the regularization path object:
pathobj <- get_what(from = uniobj1, what = "glmpathObj", model = "minAIC") plot(pathobj, xvar = "lambda", main = "")
The bottom horizontal axis is the regularization parameter $\lambda$, and the vertical axis shows values of the individual coefficients. Vertical lines mark values of $\lambda$ where the active set changes, and are labeled with model degrees of freedom along the top horizontal axix. Increasing values of $\lambda$ force the coefficient estimates toward zero (see [3] for details).
Although one can use the results from the call to lasso_screenr
as contained in uniobj1
, the logistic model parameter estimates
produced by regularization are biased toward zero. Therefore one
might choose to re-fit the selected model[2] using ordinary logistic
modeling[7] as implemented in logreg_screenr
. We can refit the
AIC-best model using:
uniobj2 <- logreg_screenr(testresult ~ Q1 + Q2 + Q3 + Q5 + Q6 + Q7, data = unicorns, Nfolds = 10, seed = 123) saveRDS(uniobj2, file = "uniobj2.rds" )
Note that the variable Q4
was excluded from the formula.
Henceforth, this refitted final model will be used to validate screening.
The logit-scale maximum-likelihood estimates and confidence limits from the AIC-best model are obtained using:
confint(uniobj2)
One can also obtain the adjusted odds ratios and/or omit the intercept. For example, the adjusted odds ratios for the six predictors can be obtained using:
confint(uniobj2, or = TRUE, intercept = FALSE)
The next task is identifying whether sufficiently effective pre-test screening is possible and, if so, selecting the most appropriate screening threshold. That task requires careful consideration and expert judgment, neither of which are provided by the screenr package. However the screenr package does provide the results that are most relevant to that task.
The receiver-operating characteristic (ROC) provides a measure of the
accuracy of screening at multiple thresholds. The screenr
package
incorporates
k-fold
cross-validation
to estimate the out-of-sample performance using only the training data
(see also 'External validation on entirely new data', below).
The ROC curve is a plot of
sensitivity on the false-positive fraction (1 - specificity) or,
equivalently, plots sensitivity against specificity with specificity plotted
in decreasing order. The ROC curves for the unicorns are displayed
using the R
code
plot(uniobj2)
which plots the overly-optimistic in-sample ROC and the cross-validated out-of-sample ROC from the AIC-best model fit. 95% confidence intervals for sensitivity and specificity are plotted at the local maxima (largest distances perpendicular to the 1:1 line). Those maxima are the complete set options for screening thresholds. A partial AUC of 0.5 would be produced from random assignment, and a value of 1.0 indicates perfect classification for that portion of the ROC curve for which sensitivity is at least 0.8. The partial area under the curve (partial AUC) is good compared with random assignment. However, sensitivity and specificity at the local maxima of the ROC curve are more relevant to test screening. It is difficult to read sensitivities and specificities from the ROC curve. Instead, the numerical values can be obtained using:
roc_maximas <- get_what(from = uniobj2, what = "ROCci", se_min = 0.9) print(roc_maximas)
The argument what = "ROCci"
specifies extraction of the threshold
probabilities of testing positive and cross-validated sensitivities
and specificities along with their 0.95\% confidence limits at the
local maxima along the cross-validated ROC curve. The argument se_min
= 0.9
limits the local maximas to those which produced a
cross-validated sensitivity estimate of at least 0.90. Here there are
ten threshold options for screening. For example, testing those unicorns for
which the predicted probability of testing positive is 0.00685 (row 5) or
larger would have an out-of-sample sensitivity of 0.959 and
specificity of 0.627.
For example, suppose we have screening results on two additional
unicorns who have not been tested previously. We can compute their
probabilities of testing positive using the predict
method for
lasso_screenr
-class objects:
new_corns <- data.frame(ID = c("Alice D.", "Bernie P."), testresult = c(NA, NA), Q1 = c(0, 0), Q2 = c(0, 0), Q3 = c(0, 1), Q5 = c(0, 1), Q6 = c(0, 0), Q7 = c(0, 0)) new <- predict(uniobj2, newdata = new_corns) print(new)
If the chosen screening threshold was 0.00686, then testing Bernie P. but screening out Alice D. would have point estimates of sensitivity and specificity of 0.959 and 0.628 if the data from those two unicorns came from the same statistical population as those in the study data.
The estimates of screening performance based on cross-validation using only the training data will hold for screening on entirely new subjects if and only if those new subjects are from the same statistical population as the subjects in the training sample. However, subjects from other geographic areas, demographic groups and other health facilities may differ from those in the training sample. In that case, cross-validated estimates of sensitivity and specificity might still be overly optimistic. Therefore it is highly desirable to externally validate the screening tool prior to use in new populations of subjects.
External validation requires a repetition of the study for the new
population(s). For example, a follow-up study was performed on
unicorns who sought care at facilities that were not represented in
the screening sample. A total of r dim(val_data)[1]
new unicorns
were interviewed and tested for UIV as part of that follow-up
study. External validation consists of assessment of the performance
of the screening tool on those new subjects.
The predict
method provides the means to predict test-positivity among
the new subjects:
val_data <- val_data[, -5]
new_preds <- predict(uniobj2, newdata = val_data) head(new_preds)
Note that we are predicting using the model fit from the initial study,
as contained in uniobj2
. The data frame val_data
contains the new
data from the follow-up study. The data frame new_preds
contains the
results of tests from the new
subjects and predicted probabilities of a positive outcome based on
the AIC- and BIC-best models fitted to the training data. The
questionnaire responses from the new subjects are also included. Those
are needed to assess performance of simplified screening based on the
sums of question weights.
The performance of the screening tool when applied to the new subjects
is again measured using the ROC. The ROC for testing in the new
subjects is obtained using the roc
function from the pROC
package:
new_roc <- pROC::roc(testresult ~ phat, data = new_preds, auc = TRUE) class(new_roc)
The pROC
package provides a plot method for roc
-class objects. Note that the syntax of the arguments to
roc
-class objects differs from usage in the screenr
package:
plot(new_roc, print_auc = TRUE, partial_auc = c(0.8, 1.0), type = "S", partial_auc_focus = "sensitivity", partial_auc_correct = TRUE)
Note that the partial AUC is slightly smaller than the partial AUC from
cross-validation of the training data. However the roc_ci
function
provided by the screenr
package provides more informative
information, including confidence intervals for sensitivity and
specificity:
new_perf <- roc_ci(new_roc, se_min = 0.8) print(new_perf)
There are two basic approaches to implementation of test screening based on the results from the unicorn study. The first requires the use of smartphones or tablets, and the second requires only the ability to count.
First, a smartphone or
tablet application could be developed which contains the questionnaire
and code that mimics the functionality of the predict
method. The
predicted probability of infection is given by
$$P = \frac{1}{1 + exp(-Xb)}$$
where X is a (design) matrix containing values of 1 in
the first column and the 0/1 question responses in the remaining
columns, and b is a column vector of the estimated logit-scale
coefficients, including the intercept.
The interviewer would enter the responses to the questions and the application would indicate whether the subject should be referred for testing based on the internally computed value of P. This is the ideal approach, and should be used wherever possible. However, an approach which does not require electronic technology may be needed in many health-care settings.
Alternatively, the non-zero coefficient estimates could be rescaled and rounded to whole numbers between 1 and m, where m is some number from, say, 3 to 10. Those rounded values are used as weights for the question responses. The score for an individual is the sum of the question weights. The choice of m involves a trade-off. Small values, say m = 3, make it easy to add-up or tally the scores. However, they will also result in degraded performance in terms of sensitivity and specificity. Large values for m may approach the performance of screening based on the predicted probabilities of infection, but require more difficult additions and therefore are more prone to human error.
We can explore this strategy for UIV screening in unicorns using the
easy_tool
function:
et_3 <- easy_tool(uniobj2, max = 3, model = "minAIC", crossval = TRUE) class(et_3)
Recall that uniobj2
contains the results of maximum-likelihood estimation. The
argument max
represents m, in this case 3. The remaining arguments
specify creation of question weights from the AIC-best model, with
performance base on the cross-validated results in uniobj2
. The
resulting object, et_3
, is of class easy_tool
. Methods for that class are found by executing:
methods(class = "easy_tool")
The question weights can be extracted using the get_what
method:
qwts <- get_what(from = et_3, what = "QuestionWeights") print(qwts)
The decision to use those question weights for screening should be based on their receiver-operating characteristic. Again, a plot method is available to display the ROC curve:
plot(et_3)
Note that the AUC is almost as large as the AUC from the actual model fit. However there are fewer local maximas, and therefore fewer choices for screening thresholds. Again, the screening thresholds for the scores based on the sums of the question weights can be printed:
qw_maximas <- get_what(from = et_3, what = "ROC") print(qw_maximas)
Referral for testing all unicorns having a screening score (threshold) of 3 or more will have point estimates for sensitivity and specificity of 0.98 and 0.55, respectively.
An example of a simple clinical screening form based on question weights is shown in the figure, below. NOTE: Screening questions for which the question weights equal 0 can and should be eliminated from the screening form.
knitr::include_graphics("UniTool.png")
How will implementation of that screening tool affect testing? To help
answer that question, we can compute the ratio of total number of
tests performed to the anticipated number of positive test results and
the anticipated prevalence among those who are screened out of
testing. Effective screening will greatly reduce the prevalence among
those who are screened out of testing and will have high
sensitivity. Efficient screening will reduce the ratio of numbers of
tests to positive test results. Both measures are obtained by using
the ntpp
(number of tests per positive) function:
ntpp(et_3)
After screening out unicorns having score less than 3 (row 4, as
identified by sensitivity and specificity), the prevalence of UIV
among the untested should be approximately 0.0006 (0.06\%). In
contrast, the prevalence of UIV among all subjects in the training
data is r (pbar <- round(mean(unicorns$testresult, na.rm = TRUE),
digits = 4))
.
The ratio of total tests to positive results is 28.7 when testing
unicorns with a score of at least 3. Without screening, in contrast,
an average of approximately
r round(1/pbar, digits = 1)
would need to be tested in order to
detect a single
infection.
The screenr
package eases the workload for development and
evaluation of screening tools. Hopefully screenr
will prove useful
to both novice and experienced R
users. This tutorial documents only
the basic steps. However much more is possible. Execute
help(screenr)
in an R
session to obtain a broader list of
capabilities. End-users are encouraged to further explore the methods
that are available for objects. Make use of the R
commands
class(some_object)
and methods(class = "some_class")
where
some_object
is the result of some function call, and "some_class"
is the class of that object. Then use help(methodname.some_class)
to
obtain help for use of method methodname
.
Some may want results that are not directly available from screenr
functions and methods. In those cases, the get_what
methods may
provide an easy way to extract the results contained in the objects
produced by, for example, the glmpath::glmpath
and pROC::roc
functions. However, experienced R
users may also use
str(some_object)
to identify other components for extraction using
the idiom some_object$some_component
.
Clemens SL, Macneal KD, Alons CL, Cohn JE. Screening algorithms to reduce burden of pediatric HIV testing: A systematic review and meta-analysis. The Pediatric Infectious Disease Journal. 2020; 39(10):e303-e309. DOI: 10.1097/INF.0000000000002715.
Hastie T, Tibshirani R, Friedman R. The Elements of Statistical Learning. Springer, New York. 2009. DOI: 10.1007/b94608.
Park MY, Hastie T. L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society Series B. 2007; 69(4):659-677. DOI: 10.1111/j.1467-9868.2007.00607.x
Linden A. Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. Journal of Evaluation in Clinical Practice. 2006; 12(2):132-139. DOI: 10.1111/j.1365-2753.2005.00598.x
Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006; 27(8):861-874. DOI: 10.1016/j.patrec.2005.10.010.
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011; 12(77):1-8. DOI: 10.1186/1471-2105-12-77.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.