Current version of this document (in English) is not up to date due to my time limitation. For the current functionality of the package, please consult the manual of the partial.plot
package in R. I hope I will update this document near future so please be patient. I'm sorry for your inconvenience
This package provides tools for visualizing a result of multivariate analyses. The goal of this package is providing a easy-to-use method to visualize results of many kinds of regression models. This package is still in development and may have bugs which can affect your interpretation of the results. Also I'm still designing the interface of the package so future version may lose some functionality in current version. If you find bugs or if you have any requests, please let me know by GitHub or email: paste following code into R to get my address: rawToChar(as.raw(c(109, 111, 103, 64, 102, 102, 112, 114, 105, 46, 97, 102, 102, 114, 99, 46, 103, 111, 46, 106, 112))).
To install the package, just copy & paste following commands into the R console.
```{R, message = FALSE, warning = FALSE, eval = FALSE} install.packages( c("model.adapter", "partial.plot"), type = "source", repos = c( "http://florivory.net/R/repos", options()$repos ) )
-------------------------------------------------------------------------------- # Quick start (for R wizards) Try this one! You can also see the [examples](visual.test.html). ```{R, echo = FALSE, message = FALSE} library(partial.plot)
# Load dataset. data(iris) # Create a prediction model by GLM. model <- glm( Petal.Length ~ (Sepal.Length + Petal.Width) * Species, data = iris ) # Load the library. library(partial.plot) # Draw relationship between sepal length and petal length. info <- partial.plot(model, c("Sepal.Length", "Species"), pch = 16) # Add a legend. pp.legend(info, "topleft") # Draw 3D plot. info <- partial.plot( model, c("Sepal.Length", "Petal.Width"), col = terrain.colors, theta = 20 )
In this time, we use the Fisher's iris dataset. This dataset having petal length (Petal.Length), petal width (Petal.Width), sepal length (Sepal.Length) and sepal width (Sepal.Width) data of 3 species of Iris (setosa, versicolor, and virginica).
We can load the dataset and see structure of the data set by following code.
# Load the dataset. data(iris) # See the structure of the dataset. str(iris) # Visualize data. par(mfrow = c(2, 2)) plot( Petal.Length ~ Sepal.Length, data = iris, pch = as.numeric(Species) + 14, col = as.numeric(Species) + 1 ) plot( Petal.Length ~ Sepal.Width, data = iris, pch = as.numeric(Species) + 14, col = as.numeric(Species) + 1 ) plot( Petal.Length ~ Petal.Width, data = iris, pch = as.numeric(Species) + 14, col = as.numeric(Species) + 1 )
In this guide, we will make a model to predict petal length by other variables. For the modeling, we are going to use generalized linear model (GLM). For explanatory variables of the model we will use sepal length, petal width and species. By including interaction terms between sepal length and species, and between petal width and species, we assume that relationships between sepal length and petal length and between petal width and petal length differ between species.
# Make prediction model. model <- glm( Petal.Length ~ (Sepal.Length + Petal.Width) * Species, data = iris ) # Show summary of the model. summary(model)
Now we are going to visualize the result of the GLM analysis with
partial.plot()
. Basic usage of partial.plot()
function is as follow.
The example below visualize predicted relationship between sepal length and
petal length of the three species.
# Load library. library(partial.plot) # Visualize predicted relationship between sepal length and petal length. partial.plot(model, c("Sepal.Length", "Species"), pch = 16)
The first argument of the partial.plot()
is model object returned by a
model function.
In this case we used model
object returned by glm()
.
The second argument is a character vector specifying names of explanatory
variables whose relationships to response variable are visualized.
In the example, "Sepal.Length"
and "Species"
are specified.
For this argument, multiple names of factors can be specified.
Those two arguments must be specified.
In addition to the above arguments, we put pch = 16
to make the graph
a bit pretty.
If we want to visualize relationship between petal width and petal length, we can specify the arguments as follow.
# Visualize predicted relationship between petal width and petal length. partial.plot(model, c("Petal.Width", "Species"), pch = 16)
partial.plot()
returns information can be used to produce its legend.
For example, if we write following command, information about the plot
is stored in the info
variable.
```{R, eval = FALSE}
info <- partial.plot(model, c("Sepal.Length", "Species"), pch = 16)
By passing `info` to `pp.legend()` function, we can draw a legend using settings used for `partial.plot()`. ```{R} # Add legend. info <- partial.plot(model, c("Sepal.Length", "Species"), pch = 16) pp.legend(info, "topleft")
The usage of pp.legend
function is almost similar to the legend
function.
The first argument is result of partial.plot
, and other arguments are passed
to legend
function.
partial.plot()
can visualize more complicated models.
In this time, we use CO2
dataset in dataset
package of R.
THis dataset having CO2 assimilation rate of plants collected
from Quebec and Mississippi in several CO2 concentrations
after the chilling experiment.
We can control some parameters of partial.plot
to change appearance of the
graph.
We can suppress drawing symbols representing residuals by setting FALSE
to draw.residual
argument.
partial.plot( model, c("Sepal.Length", "Species"), draw.residual = FALSE )
Also, we can suppress drawing predicted partial relationship by specifying
FALSE
to draw.relationship
argument.
partial.plot( model, c("Sepal.Length", "Species"), draw.relationship = FALSE, pch = 16 )
By setting 'extrapolate' to 'TRUE', predicted partial relationships are drawn for range of the explanatory variable for all groups.
info <- partial.plot( model, c("Sepal.Length", "Species"), pch = 16, extrapolate = TRUE ) pp.legend(info, "topleft")
We can change colors of the graph by several ways.
First way to change colors of the graph is passing rainbow()
or
heat.colors()
functions for col
argument of partial.plot()
.
We can use functions that described in help page of rainbow()
function
(type ?rainbow
to see the list).
# Change colors of the graph. info <- partial.plot( model, c("Sepal.Length", "Species"), pch = 16, col = rainbow ) # Colors in the legend is automatically adjusted. pp.legend(info, "topleft")
Second way to change colors is prepare named character vector denoting colors
for each factor levels and specify it for col
argument.
# Prepare a named color vector. col <- c( setosa = "darkgreen", versicolor = "blue", virginica = "orange2" ) # Specify the vector for col argument. info <- partial.plot( model, c("Sepal.Length", "Species"), pch = 16, col = col ) pp.legend(info, "topleft")
We can change other graphic parameters like plot
function.
We changed pch
to set symbol of the in the example above.
But we also use other graphic parameters.
# Set other graphic parameters info <- partial.plot( model, c("Sepal.Length", "Species"), pch = 16, cex = 1.5, xlab = "Sepal length (mm)", ylab = "Petal Length" ) pp.legend(info, "topleft")
At this moment, (maybe) lm()
, glm()
, glm.nb()
, lme()
, lmer()
, glmer()
, glmer.nb()
, glmmadmb()
, MCMCglmm()
, cforest()
, ctree()
, svm()
, randomForest()
, ranger()
, rpart()
, tree()
are supported.
Support for the models are provided by the model.adapter
package.
For the estimation of the relationship between the explanatory and response variable, partial.plot
depends on emmeans
package.
If the specified model is supported by emmeans
package, the predicted relationship is calculated by emmeans
.
Currently, emmeans
is used for lm
, glm
, glm.nb
, lme
, lmer
, glmer
, glmer.nb
, glmmTMB
, glmmadmb
and MCMCglmm
.
When the model is supported by emmeans
, response variables other than focal response variable(s) are fixed at their mean and the relationship is estimated.
When the model is not supported by emmeans
, the relationship is calculated by partial.plot
.
In this case, conditional relationship on the dataset is calculated: for the values of response variables other than focal response variables(s), all values in the dataset is used and mean values of predicted response variable is used for the relationship.
2021-06-23: When using a model created with a data.frame having column(s) standardized by the scale
function, partial.plot
produces error because the predict
method of the glmerMod
class stops.
This is because result of the scale
function is a matrix whereas partial.plot
creates data.frame
for prediction as numeric
vector.
To avoid this problem, please convert result of the scale
function to a vector before creating models, perhaps using c()
may be easy.
```{R eval=FALSE, echo=TRUE} data$column1 <- c(scale(data$column1))
## glmer.nb When the formula of a model having functions such as `offset()` and `log()`, `partial.plot` stops because it can't obtain data from the model object. To avoid this problem, please specify original data used for the modeling to the `data` argument. ```{R, eval = FALSE} # Example model <- glmer.nb(y ~ log(x) + (1 | random), data = dat) partial.plot(model, "x", data = dat)
'partial.plot' treats offset term(s) as follows:
Prediction and confidence intervals
: Similar with explanatory variables, partial.plot
uses mean value(s) for offset term(s) for prediction. For example, when the response variable of a model is a number of individuals and the offset term is study area, prediction and confidence interval are calculated using the mean value of study area.
Partial residual : Original values of the variable used for the offset term during model construction are used for calculation of partial residuals.
Due to these specification, prediction and partial residuals don't match each other in some situations. Two methods can solve the problem: 1) disabling partial residual and 2) prepare special dataset for plotting. An easier method is disabling plotting partial residuals by specifying draw.residual = FALSE
to partial.plot
. Of course this method is not a better way, but would be used for some situations. A better, but more complicated method is calculating response variable/offset term before running partial.plot
as shown in the following example.
Currently, following models are not compatible for drawing with offset terms.
glmmADMB::glmmadmb
#----------------------------------------------------------------------------- # Prepare dataset and model. #----------------------------------------------------------------------------- # Load 'Insurance' data containing number of claims from MASS package. # For the details, see '?Insurance' after loading MASS package. utils::data(Insurance, package="MASS") # Convert age from factor to numeric. Insurance$Age.numeric <- c(25, 27, 32.5, 35)[as.numeric(Insurance$Age)] # Model the number of insurance claims by district, type of cars and age. # The number of policyholders is used for the offset term. model <- glm( Claims ~ District + Group + Age.numeric + offset(log(Holders)), data = Insurance, family = poisson ) #----------------------------------------------------------------------------- # Plotting 1: case of failure. #----------------------------------------------------------------------------- # Simply draw partial dependence plot. # Predicted relationship and partial residual don't match each other. pp <- partial.plot(model, c("Age.numeric", "Group")) pp.legend(pp, "topright") #----------------------------------------------------------------------------- # Solution 1: disable drawing partial residual. # In this case, predicted relationship with mean number of policyholders # is drawn. #----------------------------------------------------------------------------- pp <- partial.plot(model, c("Age.numeric", "Group"), draw.residual = FALSE) pp.legend(pp, "topright") #----------------------------------------------------------------------------- # Solution 2: use response variable/offset term. # In this case, we calculate claims count/number of policyholders and set 1 # for number of policyholders of all records before plotting. #----------------------------------------------------------------------------- # Create a copy of the dataset. Insurance2 <- Insurance # Convert the values of response variable from number of claims to number of # claims per number of policyholders. Insurance2$Claims <- Insurance2$Claims / Insurance2$Holders # Set 1 for number of policyholders of all records. Insurance2$Holders <- 1 # Draw partial dependence plot for number of claims per number of policyholders. pp <- partial.plot(model, c("Age.numeric", "District"), data = Insurance2) pp.legend(pp, "topright")
partial.plot
package contains some functions which may be useful.
color.ramp
function creates color vector from factors.
library(partial.plot) plot( Petal.Length ~ Sepal.Length, col = color.ramp(Species), data = iris, pch = 16 )
By similar way to control colors of partial.plot
, we can change colors
produced by color.ramp
.
plot( Petal.Length ~ Sepal.Length, col = color.ramp(Species, rainbow), data = iris, pch = 16 )
col <- c( setosa = "darkgreen", versicolor = "blue", virginica = "orange2" ) plot( Petal.Length ~ Sepal.Length, col = color.ramp(Species, col), data = iris, pch = 16 )
gg.colors
creates gg.plot like colors.
barplot(1:10, col = gg.colors(10))
extraporate
argument to extrapolate
.coda
package when the model used MCMC for estimation.glm.nb
and glmer.nb
.lsmeans
to emmeans
due to discontinuation of lsmeans
.lmer()
, glmer()
, glmmadmb()
, glmmML()
, ranger()
, rpart()
and tree()
.add
option.lsmeans
package.lty
, lwd
and pch
for groups.color.ramp.data.frame()
methodcolor.ramp.numeric()
methodcforest()
, ctree()
, randomForest()
and svm()
.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.