Therefore violin plots are a powerful tool to assist researchers to visualise data, particularly in the quality checking and exploratory parts of an analysis. Violin plots have many benefits:
As shown below for the iris
dataset, violin plots show distribution information that the boxplot is unable to.
library("vioplotx")
We set up the data with two categories (Sepal Width) as follows:
data(iris) summary(iris$Sepal.Width) table(iris$Sepal.Width > mean(iris$Sepal.Width)) iris_large <- iris[iris$Sepal.Width > mean(iris$Sepal.Width), ] iris_small <- iris[iris$Sepal.Width <= mean(iris$Sepal.Width), ]
First we plot Sepal Length on its own:
boxplot(Sepal.Length~Species, data=iris, col="grey")
An indirect comparision can be achieved with par:
par(mfrow=c(2,1)) boxplot(Sepal.Length~Species, data=iris_small, col = "lightblue") boxplot(Sepal.Length~Species, data=iris_large, col = "palevioletred") par(mfrow=c(1,1))
First we plot Sepal Length on its own:
vioplotx(Sepal.Length~Species, data=iris)
An indirect comparision can be achieved with par:
par(mfrow=c(2,1)) vioplotx(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "line") vioplotx(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "line") par(mfrow=c(1,1))
A more direct comparision can be made with the side
argument and add = TRUE
on the second plot:
vioplotx(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "line", side = "right") vioplotx(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "line", side = "left", add = T) title(xlab = "Species", ylab = "Sepal Length") legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")
The line median option is more suitable for side by side comparisions but the point option is still available also:
vioplotx(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "point", side = "right", pchMed = 21, colMed = "palevioletred4", colMed2 = "palevioletred2") vioplotx(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "point", side = "left", pchMed = 21, colMed = "lightblue4", colMed2 = "lightblue2", add = T) title(xlab = "Species", ylab = "Sepal Length") legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")
It may be necessary to include a points
command to fix the median being overwritten by the following plots:
vioplotx(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "point", side = "right", pchMed = 21, colMed = "palevioletred4", colMed2 = "palevioletred2") vioplotx(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "point", side = "left", pchMed = 21, colMed = "lightblue4", colMed2 = "lightblue2", add = T) points(1:length(levels(iris$Species)), as.numeric(sapply(levels(iris$Species), function(species) median(iris_large[grep(species, iris_large$Species),]$Sepal.Length))), pch = 21, col = "palevioletred4", bg = "palevioletred2") title(xlab = "Species", ylab = "Sepal Length") legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")
Similarly points could be added where a line has been used previously:
vioplotx(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "line", side = "right", pchMed = 21, colMed = "palevioletred4", colMed2 = "palevioletred2") vioplotx(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "line", side = "left", pchMed = 21, colMed = "lightblue4", colMed2 = "lightblue2", add = T) points(1:length(levels(iris$Species)), as.numeric(sapply(levels(iris$Species), function(species) median(iris_large[grep(species, iris_large$Species),]$Sepal.Length))), pch = 21, col = "palevioletred4", bg = "palevioletred2") points(1:length(levels(iris$Species)), as.numeric(sapply(levels(iris$Species), function(species) median(iris_small[grep(species, iris_small$Species),]$Sepal.Length))), pch = 21, col = "lightblue4", bg = "lightblue2") title(xlab = "Species", ylab = "Sepal Length") legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")
Here it is aesthetically pleasing and intuitive to interpret categorical differences in mean and variation in a continuous variable.
These extensions to vioplotx
here are based on those provided here:
These have previously been discussed on the following sites:
http://mbjoseph.github.io/2013/06/24/violin.html
http://mbjoseph.github.io/blog/2013/06/24/violin/
http://tagteam.harvard.edu/hub_feeds/1981/feed_items/209875
https://www.r-bloggers.com/split-violin-plots/
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.