knitr::opts_chunk$set( collapse = TRUE, message=FALSE, warning = FALSE, comment = NA, fig.height=7, fig.width=7, dpi=150 )
R package multipleROC is for ROC analysis with models with multiple predictors. You can draw a ROC plot with ggplot2 for models with multiple predictors. You can draw several ROC curves in a plot. You can perform automatic stepwise logistic regression analysis and compare the initial and final models.
You can install multipleROC package from github.
install.packages("remotes") remotes::install_github("cardiomoon/multipleROC")
library(multipleROC)
Data radial from package moonBook
is a dataset containing demographic data and laboratory data of 115 patients. If you want to differentiate male and female patients with their heights(in centimeter), you may draw exploratory plot first.
library(ggplot2) library(dplyr) library(tidyr) theme_set(theme_bw()) data(radial,package="moonBook")
ggplot(data=radial)+geom_density(aes(x=height,fill=sex),alpha=0.5) ggplot(data=radial)+geom_boxplot(aes(x=sex,y=height,fill=sex),alpha=0.5)
As you can see, male patients are taller than female patients. But how can I determine the optimal cutoff value?
To draw a ROC curve is one of the useful method to determine the optimal cutoff value. You can perform ROC analysis using the following R command.The following R command makes an object of class multipleROC
and makes a plot.
data(radial,package="moonBook") x=multipleROC(male~height,data=radial)
You can see the ROC plot with sensitivity(Sens), specificity(Spec), positive predictive value(PPV) and negative predictive value(NPV) with the cutoff value 161. These rates comes from the following table.
table(radial$height>=161,radial$male)
If you use the height $\geq$ 161cm to differentiate male patients from female patients, the true positive(TP) is 50(out of 114), true negative(TN) is 49, false positive(FP) is 7 and the false negative(FN) is 8 patients. You can calculate sensitivity, specificity, positive predictive value and negative predictive values by manual. Sensitivity measures how often a test correctly generates a positive result for people who have the condition that’s being tested for (also known as the “true positive” rate).
\begin{equation} Sensitivity=\frac{TP}{Male\, Patients}=\frac{TP}{FN+TP}=\frac{50}{8+50}=86.2 \% \end{equation}
Specificity measures a test’s ability to correctly generate a negative result for people who don’t have the condition that’s being tested for (also known as the “true negative” rate).
\begin{equation} Specificity=\frac{TN}{Female\, Patients}=\frac{TN}{TN+FP}=\frac{49}{49+7}=87.5 \% \end{equation}
Positive predictive value(PPV) is calculated as follows:
\begin{equation} PPV=\frac{TP}{Pts\,with\, height \geq 161cm}=\frac{TP}{TP+FP}=\frac{50}{50+7}=87.7\% \end{equation}
Negative predictive value(PPV) is calculated as follows:
\begin{equation} NPV=\frac{TN}{Pts\,with\, height < 161cm}=\frac{TN}{TN+FN}=\frac{49}{49+8}= 85.9 \% \end{equation}
You can calculate the sensitivity, specificity for all the height data of radial. The radial data are from 115 patients, but 1 patient's height is missed. But the length of unique values of radial$height is 39 due to duplication.
length(unique(radial$height))
You can calculate all the ratio by hand. For your convenience, you can use calSens() function included in multipleROC package.
calSens=function(x,y){ newx=sort(c(unique(x),max(x,na.rm=TRUE)+1)) completeTable=function(res){ if(nrow(res)==1){ res1=matrix(c(0,0),nrow=1) temp=setdiff(c("TRUE","FALSE"),attr(res,"dimnames")[[1]][1]) if(temp=="FALSE") res=rbind(res1,res) else res=rbind(res,res1) res } res } getSens=function(cut){ res=table(x>=cut,y) res=completeTable(res) sens=res[2,2]/sum(res[,2]) spec=res[1,1]/sum(res[,1]) ppv=res[2,2]/sum(res[2,]) npv=res[1,1]/sum(res[1,]) data.frame(x=cut,sens=sens,spec=spec,fpr=1-spec,ppv=ppv,npv=npv,sum=sens+spec) } map_dfr(newx,function(cut){getSens(cut)}) }
result=calSens(radial$height,radial$male) result
The last x value of data.frame result is a arbitary value to ensure the ROC curve start from c(0,0).
You can make exploratory plot of sensitivity and specificity. First, You have to transform the data in long from.
longdf <- result %>% pivot_longer(cols=sens:spec,names_to = "rate") ggplot(data=longdf,aes(x=x,y=value,color=rate))+geom_line()
As you can see, the higher the sensitivity, the lower the specificity. The optimal cutoff value is determined where the sum of specificity and sensitivity is the highest.
result[which.max(result$sum),]
You can see when the height is 161cm, the sum of sensitivity and specificity is the highest. So the best cutoff value of height is 161cm. With the result, you can draw the ROC plot.
result=result[order(result$sens),] ggplot(data=result,aes(x=fpr,y=sens))+geom_line()
Look at the first ROC plot again. In the ROC plot, you can see the cutoff value of lr.eta is 0.545. Where does this value come from? This value comes from the generalized linear model.
fit=glm(male~height,data=radial,family=binomial) fit$fitted.values
You can see the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. You can use this value to make a ROC plot. Again, you can use calSens() function.
result2=calSens(fit$fitted.values,fit$y) result2[which.max(result2$sum),]
The optimal fitted value is 0.5451. You can draw ROC curve with this value.
result2=result2[order(result2$sens),] ggplot(data=result2,aes(x=fpr,y=sens))+geom_line()
What is the relationship of the fitted value of 0.5451 and height 161cm ? As you knows, you can interpretate the logistic equation as follows.
\begin{equation} log(\frac{p}{1-p})=\beta_0+\beta_1*X \end{equation}
This equation is expressed with the following R command.
qlogis(x$cutpoint)=coef(x$fit)[1]+coef(x$fit)[2]*X
So you can calculate the optimal cutpoint of height with the optimal fitted value 0.5451 as follows.
height=(qlogis(x$cutpoint)-unname(coef(x$fit)[1]))/unname(coef(x$fit)[2]) height
In the lower right corner, you can see the area under curve(AUC) and the p value from Wilcoxon Rank Sum test. The p value comes from:
wilcox.test(radial$height,radial$male)
The AUC is calculated by the simpleAUC function included in multipleROC package.
simpleAUC <- function(df){ df=df[order(df$x,decreasing=TRUE),] TPR=df$sens FPR=df$fpr dFPR <- c(diff(FPR), 0) dTPR <- c(diff(TPR), 0) sum(TPR * dFPR) + sum(dTPR * dFPR)/2 } simpleAUC(x$df)
You can convert a multipleROC object to a roc object. You can use this object to calculate auc or compare the AUC of two ROC curves.
class(x) multipleROC2roc=function(x){ pROC::roc(x$fit$y,x$fit$fitted.values,ci=T) } x2 <- multipleROC2roc(x) class(x2) pROC::auc(x2)
You can draw multiple ROC curves in the same plot. First, make a list of multipleROC objects and use the plot_ROC function.
a=multipleROC(male~height,data=radial,plot=FALSE) b=multipleROC(male~age,data=radial,plot=FALSE) c=multipleROC(form=male~weight,data=radial,plot=FALSE) plot_ROC(list(a,b,c),show.eta=FALSE,show.sens=FALSE)
You can make facetted plot with the following R command.
plot_ROC(list(a,b,c),facet=TRUE)
By setting the facet argument TRUE, you can get the facetted plot. Alternatively you can use the facet_grid() function of the package ggplot2.
require(ggplot2) plot_ROC(list(a,b,c))+facet_grid(no~.)
You can make ROC plots with multiple predictors.
multipleROC(male~height+weight+age,data=radial)
This plot is made from the following glm
object.
fit=glm(male~height+weight+age,data=radial,family=binomial)
You can use automatic stepwise backward elmination by AIC in multiple glm model.
final=step(fit,trace=0) summary(final)
You can compare the initial and final model.
anova(final,fit,test="Chisq")
You can draw plot comparing two ROC curves of the initial and the final model.
step_ROC(male~weight+height+age,data=radial)
You can get the anova table.
step_ROC(male~weight+height+age,data=radial,plot=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.