ggPredict() - Visualize multiple regression model

To reproduce this document, you have to install R package ggiraphExtra from github.
install.packages("devtools")
devtools::install_github("cardiomoon/ggiraphExtra")

Linear regression Model

Simple linear regression model

In univariate regression model, you can use scatter plot to visualize model. For example, you can make simple linear regression model with data radial included in package moonBook. The radial data contains demographic data and laboratory data of 115 patients performing IVUS(intravascular ultrasound) examination of a radial artery after tansradial coronary angiography. The NTAV(normalized total atheroma volume measured by intravascular ultrasound(IVUS) in cubic mm) is a quantitative measurement of atherosclerosis. Suppose you want to predict the amount of atherosclerosis(NTAV) from age.

require(moonBook)   # for use of data radial
fit=lm(NTAV~age,data=radial)
summary(fit)

You can get the regression equation from summary of regression model:

y=`r round(coef(fit)[2],2)`*x+`r round(coef(fit)[1],2)`

You can visualize this model easily with ggplot2 package.

require(ggplot2)
ggplot(radial,aes(y=NTAV,x=age))+geom_point()+geom_smooth(method="lm")

You can make interactive plot easily with ggPredict() function included in ggiraphExtra package.

require(ggiraph)
require(ggiraphExtra)
require(plyr)
ggPredict(fit,se=TRUE,interactive=TRUE)

With this plot, you can identify the points and see the regression equation with your mouse.

Multiple regression model without interaction

You can make a regression model with two predictor variables. Now you can use age and sex as predictor variables.

fit1=lm(NTAV~age+sex,data=radial)
summary(fit1)

From the result of regression analysis, you can get regression regression equations of female and male patients :

For female patient, y=`r round(coef(fit1)[2],2)`*x+`r round(coef(fit1)[1],2)`
For male patient, y=`r round(coef(fit1)[2],2)`*x+`r round(coef(fit1)[1],2)+round(coef(fit1)[3],2)`

You can visualize this model with ggplot2 package.

equation1=function(x){coef(fit1)[2]*x+coef(fit1)[1]}
equation2=function(x){coef(fit1)[2]*x+coef(fit1)[1]+coef(fit1)[3]}

ggplot(radial,aes(y=NTAV,x=age,color=sex))+geom_point()+
        stat_function(fun=equation1,geom="line",color=scales::hue_pal()(2)[1])+
        stat_function(fun=equation2,geom="line",color=scales::hue_pal()(2)[2])

You can make interactive plot easily with ggPredict() function included in ggiraphExtra package.

ggPredict(fit1,se=TRUE,interactive=TRUE)

Multiple regression model with interaction

You can make a regession model with two predictor variables with interaction. Now you can use age and DM(diabetes mellitus) and interaction between age and DM as predcitor variables.

fit2=lm(NTAV~age*DM,data=radial)
summary(fit2)

The regression equation in this model are as follows: For patients without DM(DM=0), the intercept is 49.65 and the slope is 0.29. For patients with DM(DM=1), the intercept is 49.65-20.86 and the slope is 0.29+0.35.

For patients without DM(DM=0), y=`r round(coef(fit2)[2],2)`*x+`r round(coef(fit2)[1],2)`
For patients without DM(DM=1), y=`r round((coef(fit2)[2]+coef(fit2)[4]),2)`*x+`r round((coef(fit2)[1]+coef(fit2)[3]),2)`

You can visualize this model with ggplot2.

ggplot(radial,aes(y=NTAV,x=age,color=factor(DM)))+geom_point()+stat_smooth(method="lm",se=FALSE)

You can make interactive plot easily with ggPredict() function included in ggiraphExtra package.

ggPredict(fit2,colorAsFactor = TRUE,interactive=TRUE)

Multiple regression model with two continuous predictor variables with or without interaction

You can make a regession model with two continuous predictor variables. Now you can use age and weight(body weight in kilogram) as predcitor variables.

fit3=lm(NTAV~age*weight,data=radial)
summary(fit3)

From the analysis, you can get the regression equation for a patient with body weight 40kg, the intercept is 37.61+(-0.10416)*40 and the slope is -0.33+0.01468*40

For bodyweight 40kg, y=`r round(coef(fit3)[2]+coef(fit3)[4]*40,2)`*x+`r round(coef(fit3)[1]+coef(fit3)[3]*40,2)`
For bodyweight 50kg, y=`r round(coef(fit3)[2]+coef(fit3)[4]*50,2)`*x+`r round(coef(fit3)[1]+coef(fit3)[3]*50,2)`
For bodyweight 90kg, y=`r round(coef(fit3)[2]+coef(fit3)[4]*90,2)`*x+`r round(coef(fit3)[1]+coef(fit3)[3]*90,2)`

To visualize this model, the simple ggplot command shows only one regression line.

ggplot(radial,aes(y=NTAV,x=age,color=weight))+geom_point()+stat_smooth(method="lm",se=FALSE)

You can easily show this model with ggPredict() function.

ggPredict(fit3,interactive=TRUE)

Multiple regression model with three predictor variables

You can make a regession model with three predictor variables. Now you can use age and weight(body weight in kilogram) and HBP(hypertension) as predcitor variables.

fit4=lm(NTAV~age*weight*HBP,data=radial)
summary(fit4)

From the analysis result, you can get the regression equation for a patient without hypertension(HBP=0) and body weight 60kg: the intercept is 64.12+(-0.39685*60) and the slope is -0.67650+(0.01686*60). The equation for a patient with hypertension(HBP=1) and same body weight: the intercept is 64.12+(-0.39685*60-101.94) and the slope is -0.67650+(0.01686*60)+1.27972+(-001666*60).

To visualize this model, you can make a faceted plot with ggPredict() function. You can see the regression equation of each subset with hovering your mouse on the regression lines.

ggPredict(fit4,interactive = TRUE)

Logistic regression model

Multiple logistic regression model with two predictor variables

Model with interaction

You can use glm() function to make a logistic regression model. The GBSG2 data in package TH.data contains data from German Breast Cancer Study Group 2. Suppose you want to predict survival with number of positive nodes and hormonal therapy.

require(TH.data) # for use data GBSG2
fit5=glm(cens~pnodes*horTh,data=GBSG2,family=binomial)
summary(fit5)

You can easily visualize this model with ggPredict() function.

ggPredict(fit5,se=TRUE,interactive=TRUE,digits=3)

Model without interaction

You can make multiple logistic regression model with no interaction between predictor variables.

fit6=glm(cens~pnodes+horTh,data=GBSG2,family=binomial)
summary(fit6)
ggPredict(fit6,se=TRUE,interactive=TRUE,digits=3)

Multiple logistic regression model with two continuous predictor variables

You can make multiple logistic regression model with two continuous variables with interaction.

fit7=glm(cens~pnodes*age,data=GBSG2,family=binomial)
summary(fit7)
ggPredict(fit7,interactive=TRUE)

You can adjust the number of regression lines with parameter colorn. For example you can draw 100 regression lines with following R command.

ggPredict(fit7,interactive=TRUE,colorn=100,jitter=FALSE)

Multiple logistic regression model with two continuous predictor variables

You can make multiple logistic regression model with three predictor variables.

fit8=glm(cens~pnodes*age*horTh,data=GBSG2,family=binomial)
summary(fit8)
ggPredict(fit8,interactive=TRUE,colorn=100,jitter=FALSE)


Try the ggiraphExtra package in your browser

Any scripts or data that you put into this service are public.

ggiraphExtra documentation built on Oct. 23, 2020, 7:39 p.m.