knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  )
library(screenr)
data(unicorns)
data(val_data)

Introduction {-}

The screenr package eases the development and validation of pre-testing screening tools of the sort reviewed by Clemens et al [1], but is applicable to any test-screening problem. Universal testing for a condition can be impractical if the test procedure is relatively expensive and the condition is rare. Alternatively, testing only those subjects having a sufficiently large probability of testing positive may be viable if a screening procedure can be developed which has an acceptable sensitivity. That may be possible if easily measured/recorded risk factors for test positivity can be identified.

This vignette demonstrates the development and validation of such a screening tool using lasso-like L1 regularization [2] of logistic models, as implemented in the glmpath package [3], and also model fitting using maximum-likelihood estimation. Regularization enables robust model selection, but the parameter estimates are biased toward zero. Therefore one might choose to re-fit the model selected by regularization using maximum-likelihood estimation. Model performance is evaluated using receiver-operating characteristics (ROC) [4,5].

An (Artificial) Example {-}

Unicorns suffer an epidemic of infection by the Unicorn Immunodeficiency Virus (UIV). There is currently no cure for UIV infection, but highly effective antiretrovial therapy has the potential to block forward transmission and to avert death from opportunistic infections associated with UIV infection. Therefore it is critical that UIV infection is detected, and that infected unicorns are enrolled in treatment. However, UIV infection is sufficiently rare that universal testing of unicorns at health-care entry points is impractical.

A sample of r dim(unicorns)[1] properly consented adult unicorns were enrolled in a study aimed at evidence-based targeting of testing for UIV infection. The subjects were administered a questionnaire identifying the presence or absence of putative risk factors for UIV infection. The prevalence of UIV is low, and therefore universal testing was deemed impractical. The challenge then is to identify unicorns who should be prioritized for testing. Because UIV is transmissible and fatal if left untreated, it is important that the screening tool have an acceptably high sensitivity. The screenr package enables development and validation of such screening tools.

The screening questionnaire included seven questions which were selected based on published information on infection risk. The data consists of the responses to the screening questions Q1, ..., Q7 (coded as 1 for an affirmative response and 0 for a negative response), and the testing outcome (testresult), again coded 0 and 1 for negative and positive results, respectively.

NOTE: It is critically important that all of the putative risk factors have a positive association with the outcome. That is, the questionnaire must be designed so that affirmative (yes) responses are hypothesized to indicate an increased risk of UIV infection.

The data from the unicorn study look like:

## The first six lines of the unicorn screening data:
head(unicorns)

The prevalence of UIV in the sample of unicorns is r mean(unicorns$testresult, na.rm = TRUE). Under universal testing at health-care entry points, discovery of a single infection would require, on average, testing r round(1 / mean(unicorns$testresult, na.rm = TRUE)) unicorns.

Screening Tool Development {-}

This demonstration of screening-tool development and validation consists of five steps:

  1. Model selection
  2. Refitting the final model
  3. Selection of a screening threshold
  4. External validation on entirely new data
  5. Implementation

In practice, unfortunately, the fourth step is sometimes omitted due to resource limitations.

Model selection {-}

The first step is the selection of a parsimonious logistic screening model. A method known as L1 regularization has desirable properties for that task [2,3]. The function lasso_screenr() provides easy, integrated access to the L1 regularization algorithm of Park and Hastie [3], as implemented in the glmpath R package, using a convenient formula interface:

uniobj1 <- lasso_screenr(testresult ~ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q7,
                         data = unicorns, Nfolds = 10, seed = 123)

The formula testresult ~ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q7 is an expression of the statement "predict testresult from the seven covariates Q1, ..., Q7. The argument Nfolds = 10 specifies the desired number of partitions (folds) to be used for cross validation (discussed under 'Selection of a screening threshold', below). The optional seed argument specifies the starting value for the random-number generator used for cross-validation data-splitting. Here it is set to a fixed value to insure reproducibility. By default, the seed is set by the system clock, and therefore results will vary from run to run. Set the seed only if reproducible results are needed.

The fitting algorithm computes logit-scale coefficients along the entire regularization path, and has the effect of selecting covariates based on penalization by the L1-norm of their coefficients, similar to the ordinary lasso [2]. Unlike the ordinary lasso, the algorithm also imposes a very small penalty on the L2-norm, which eliminates adverse effects of any strong correlations among the predictors and provides useful estimates even if the response variable is separable by the predictors (albeit unlikely in test-screening applications).

IMPORTANT: The object returned by lasso_screenr (or any of the other "_screenr" functions) contains information critical to validation and implementation of test screening. Therefore the object should be saved for future use, for example:

saveRDS(uniobj1, file = "uniobj1.rds")

The resulting lasso_screenr-class object (uniobj1) includes the results from regularization and performance measures from the fits to all of the data using k-fold cross validation. The resulting object, uniobj1, contains all of the information need to develop and (internally) validate a screening tool for diagnostic tests. The screenr package provides the following methods for lasso-screenr-class objects:

methods(class = "lasso_screenr")

The methods should make it mostly unnecessary to access components of lasso_screenr-class objects directly using R code similar to object$component.

For example, the summary method method provides basic information about the results of regularization:

summary(uniobj1)

Regularization yielded logistic eight models. Two models along the regularization path which yield the smallest AIC and BIC values, denoted "minAIC" and "minBIC", respectively, are given special status. Those are likely the most useful solutions in nearly all settings. The default for pAUC is the area under the ROC curve for which sensitivity is at least 0.8, but see the results of help(lasso_screenr) for other options. However all solutions are accessible from lasso_screer-class objects.

The logit-scale coefficients for the two special models are obtained using:

coef(uniobj1)

Note that the coefficients for Q4 have been shrunk to zero in both the AIC- and BIC-best models, which happen to coincide in the unicorn data. In effect, Q4 has been eliminated from both of those models. Q4 was not an important predictor of the test result. One can also obtain the adjusted odds ratios and/or omit the intercept. For example, the adjusted odds ratios for the seven predictors can be obtained using:

coef(uniobj1, or = TRUE, intercept = FALSE)

The adjusted odds ratios for Q4 are 1.0, again indicating that Q4 was not predictive of the test result. One can also examine the coefficients everywhere the active set changed along the regularization path for either of the special fits by extracting and plotting the regularization path object:

pathobj <- get_what(from = uniobj1, what = "glmpathObj", model = "minAIC")
plot(pathobj, xvar = "lambda", main = "")

The bottom horizontal axis is the regularization parameter $\lambda$, and the vertical axis shows values of the individual coefficients. Vertical lines mark values of $\lambda$ where the active set changes, and are labeled with model degrees of freedom along the top horizontal axix. Increasing values of $\lambda$ force the coefficient estimates toward zero (see [3] for details).

Refitting the final model {-}

Although one can use the results from the call to lasso_screenr as contained in uniobj1, the logistic model parameter estimates produced by regularization are biased toward zero. Therefore one might choose to re-fit the selected model[2] using ordinary logistic modeling[7] as implemented in logreg_screenr. We can refit the AIC-best model using:

uniobj2 <- logreg_screenr(testresult ~ Q1 + Q2 + Q3 + Q5 + Q6 + Q7,
                          data = unicorns, Nfolds = 10,
                          seed =  123)
saveRDS(uniobj2, file = "uniobj2.rds" )

Note that the variable Q4 was excluded from the formula. Henceforth, this refitted final model will be used to validate screening.

The logit-scale maximum-likelihood estimates and confidence limits from the AIC-best model are obtained using:

confint(uniobj2)

One can also obtain the adjusted odds ratios and/or omit the intercept. For example, the adjusted odds ratios for the six predictors can be obtained using:

confint(uniobj2, or = TRUE, intercept = FALSE)

Selection of a screening threshold {-}

The next task is identifying whether sufficiently effective pre-test screening is possible and, if so, selecting the most appropriate screening threshold. That task requires careful consideration and expert judgment, neither of which are provided by the screenr package. However the screenr package does provide the results that are most relevant to that task.

The receiver-operating characteristic (ROC) provides a measure of the accuracy of screening at multiple thresholds. The screenr package incorporates k-fold cross-validation to estimate the out-of-sample performance using only the training data (see also 'External validation on entirely new data', below). The ROC curve is a plot of sensitivity on the false-positive fraction (1 - specificity) or, equivalently, plots sensitivity against specificity with specificity plotted in decreasing order. The ROC curves for the unicorns are displayed using the R code

plot(uniobj2)

which plots the overly-optimistic in-sample ROC and the cross-validated out-of-sample ROC from the AIC-best model fit. 95% confidence intervals for sensitivity and specificity are plotted at the local maxima (largest distances perpendicular to the 1:1 line). Those maxima are the complete set options for screening thresholds. A partial AUC of 0.5 would be produced from random assignment, and a value of 1.0 indicates perfect classification for that portion of the ROC curve for which sensitivity is at least 0.8. The partial area under the curve (partial AUC) is good compared with random assignment. However, sensitivity and specificity at the local maxima of the ROC curve are more relevant to test screening. It is difficult to read sensitivities and specificities from the ROC curve. Instead, the numerical values can be obtained using:

roc_maximas <- get_what(from = uniobj2, what = "ROCci", se_min = 0.9)
print(roc_maximas)

The argument what = "ROCci" specifies extraction of the threshold probabilities of testing positive and cross-validated sensitivities and specificities along with their 0.95\% confidence limits at the local maxima along the cross-validated ROC curve. The argument se_min = 0.9 limits the local maximas to those which produced a cross-validated sensitivity estimate of at least 0.90. Here there are ten threshold options for screening. For example, testing those unicorns for which the predicted probability of testing positive is 0.00685 (row 5) or larger would have an out-of-sample sensitivity of 0.959 and specificity of 0.627.

For example, suppose we have screening results on two additional unicorns who have not been tested previously. We can compute their probabilities of testing positive using the predict method for lasso_screenr-class objects:

new_corns <- data.frame(ID = c("Alice D.", "Bernie P."),
                        testresult = c(NA, NA),
                        Q1 = c(0, 0), Q2 = c(0, 0), Q3 = c(0, 1),
                        Q5 = c(0, 1), Q6 = c(0, 0), Q7 = c(0, 0))
new <- predict(uniobj2, newdata = new_corns)
print(new)

If the chosen screening threshold was 0.00686, then testing Bernie P. but screening out Alice D. would have point estimates of sensitivity and specificity of 0.959 and 0.628 if the data from those two unicorns came from the same statistical population as those in the study data.

External validation on entirely new data {-}

The estimates of screening performance based on cross-validation using only the training data will hold for screening on entirely new subjects if and only if those new subjects are from the same statistical population as the subjects in the training sample. However, subjects from other geographic areas, demographic groups and other health facilities may differ from those in the training sample. In that case, cross-validated estimates of sensitivity and specificity might still be overly optimistic. Therefore it is highly desirable to externally validate the screening tool prior to use in new populations of subjects.

External validation requires a repetition of the study for the new population(s). For example, a follow-up study was performed on unicorns who sought care at facilities that were not represented in the screening sample. A total of r dim(val_data)[1] new unicorns were interviewed and tested for UIV as part of that follow-up study. External validation consists of assessment of the performance of the screening tool on those new subjects.

The predict method provides the means to predict test-positivity among the new subjects:

val_data <- val_data[, -5]
new_preds <- predict(uniobj2, newdata = val_data)
head(new_preds)

Note that we are predicting using the model fit from the initial study, as contained in uniobj2. The data frame val_data contains the new data from the follow-up study. The data frame new_preds contains the results of tests from the new subjects and predicted probabilities of a positive outcome based on the AIC- and BIC-best models fitted to the training data. The questionnaire responses from the new subjects are also included. Those are needed to assess performance of simplified screening based on the sums of question weights.

The performance of the screening tool when applied to the new subjects is again measured using the ROC. The ROC for testing in the new subjects is obtained using the roc function from the pROC package:

new_roc <- pROC::roc(testresult ~ phat, data = new_preds, auc = TRUE)
class(new_roc)

The pROC package provides a plot method for roc-class objects. Note that the syntax of the arguments to roc-class objects differs from usage in the screenr package:

plot(new_roc, print_auc = TRUE, partial_auc =  c(0.8, 1.0), type = "S",
     partial_auc_focus = "sensitivity", partial_auc_correct = TRUE)

Note that the partial AUC is slightly smaller than the partial AUC from cross-validation of the training data. However the roc_ci function provided by the screenr package provides more informative information, including confidence intervals for sensitivity and specificity:

new_perf <- roc_ci(new_roc, se_min = 0.8)
print(new_perf)

Implementation {-}

There are two basic approaches to implementation of test screening based on the results from the unicorn study. The first requires the use of smartphones or tablets, and the second requires only the ability to count.

Implementation based on probabilities of testing positive {-}

First, a smartphone or tablet application could be developed which contains the questionnaire and code that mimics the functionality of the predict method. The predicted probability of infection is given by $$P = \frac{1}{1 + exp(-Xb)}$$ where X is a (design) matrix containing values of 1 in the first column and the 0/1 question responses in the remaining columns, and b is a column vector of the estimated logit-scale coefficients, including the intercept.

The interviewer would enter the responses to the questions and the application would indicate whether the subject should be referred for testing based on the internally computed value of P. This is the ideal approach, and should be used wherever possible. However, an approach which does not require electronic technology may be needed in many health-care settings.

Simplified implementation based on question weights {-}

Alternatively, the non-zero coefficient estimates could be rescaled and rounded to whole numbers between 1 and m, where m is some number from, say, 3 to 10. Those rounded values are used as weights for the question responses. The score for an individual is the sum of the question weights. The choice of m involves a trade-off. Small values, say m = 3, make it easy to add-up or tally the scores. However, they will also result in degraded performance in terms of sensitivity and specificity. Large values for m may approach the performance of screening based on the predicted probabilities of infection, but require more difficult additions and therefore are more prone to human error.

We can explore this strategy for UIV screening in unicorns using the easy_tool function:

et_3 <- easy_tool(uniobj2, max = 3, model = "minAIC", crossval = TRUE)
class(et_3)

Recall that uniobj2 contains the results of maximum-likelihood estimation. The argument max represents m, in this case 3. The remaining arguments specify creation of question weights from the AIC-best model, with performance base on the cross-validated results in uniobj2. The resulting object, et_3, is of class easy_tool. Methods for that class are found by executing:

methods(class = "easy_tool")

The question weights can be extracted using the get_what method:

qwts <- get_what(from = et_3, what = "QuestionWeights")
print(qwts)

The decision to use those question weights for screening should be based on their receiver-operating characteristic. Again, a plot method is available to display the ROC curve:

plot(et_3)

Note that the AUC is almost as large as the AUC from the actual model fit. However there are fewer local maximas, and therefore fewer choices for screening thresholds. Again, the screening thresholds for the scores based on the sums of the question weights can be printed:

qw_maximas <- get_what(from = et_3, what = "ROC")
print(qw_maximas)

Referral for testing all unicorns having a screening score (threshold) of 3 or more will have point estimates for sensitivity and specificity of 0.98 and 0.55, respectively.

An example of a simple clinical screening form based on question weights is shown in the figure, below. NOTE: Screening questions for which the question weights equal 0 can and should be eliminated from the screening form.

knitr::include_graphics("UniTool.png")

Effects of screening on testing results {-}

How will implementation of that screening tool affect testing? To help answer that question, we can compute the ratio of total number of tests performed to the anticipated number of positive test results and the anticipated prevalence among those who are screened out of testing. Effective screening will greatly reduce the prevalence among those who are screened out of testing and will have high sensitivity. Efficient screening will reduce the ratio of numbers of tests to positive test results. Both measures are obtained by using the ntpp (number of tests per positive) function:

ntpp(et_3)

After screening out unicorns having score less than 3 (row 4, as identified by sensitivity and specificity), the prevalence of UIV among the untested should be approximately 0.0006 (0.06\%). In contrast, the prevalence of UIV among all subjects in the training data is r (pbar <- round(mean(unicorns$testresult, na.rm = TRUE), digits = 4)).

The ratio of total tests to positive results is 28.7 when testing unicorns with a score of at least 3. Without screening, in contrast, an average of approximately r round(1/pbar, digits = 1) would need to be tested in order to detect a single infection.

Epilogue {-}

The screenr package eases the workload for development and evaluation of screening tools. Hopefully screenr will prove useful to both novice and experienced R users. This tutorial documents only the basic steps. However much more is possible. Execute help(screenr) in an R session to obtain a broader list of capabilities. End-users are encouraged to further explore the methods that are available for objects. Make use of the R commands class(some_object) and methods(class = "some_class") where some_object is the result of some function call, and "some_class" is the class of that object. Then use help(methodname.some_class) to obtain help for use of method methodname.

Some may want results that are not directly available from screenr functions and methods. In those cases, the get_what methods may provide an easy way to extract the results contained in the objects produced by, for example, the glmpath::glmpath and pROC::roc functions. However, experienced R users may also use str(some_object) to identify other components for extraction using the idiom some_object$some_component.

References {-}

  1. Clemens SL, Macneal KD, Alons CL, Cohn JE. Screening algorithms to reduce burden of pediatric HIV testing: A systematic review and meta-analysis. The Pediatric Infectious Disease Journal. 2020; 39(10):e303-e309. DOI: 10.1097/INF.0000000000002715.

  2. Hastie T, Tibshirani R, Friedman R. The Elements of Statistical Learning. Springer, New York. 2009. DOI: 10.1007/b94608.

  3. Park MY, Hastie T. L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society Series B. 2007; 69(4):659-677. DOI: 10.1111/j.1467-9868.2007.00607.x

  4. Linden A. Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. Journal of Evaluation in Clinical Practice. 2006; 12(2):132-139. DOI: 10.1111/j.1365-2753.2005.00598.x

  5. Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006; 27(8):861-874. DOI: 10.1016/j.patrec.2005.10.010.

  6. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011; 12(77):1-8. DOI: 10.1186/1471-2105-12-77.



sgutreuter/screenr documentation built on Nov. 20, 2022, 2:41 a.m.