In xuan-liang/ggmatplot: Plot Columns of Two Matrices Against Each Other Using 'ggplot2'

library(tidyverse)
library(knitr)
library(kableExtra)
knitr::opts_chunk$set(message = FALSE,
                      warning = FALSE)

# data 

wide_df <- tribble(~restaurant, ~food_rating, ~service_rating, ~ambience_rating, ~overall_rating,
        "R1", 4.3, 3.4, 4.3, 4.9,
        "R2", 4.3, 5.0, 4.5, 4.4,
        "R3", 3.2, 4.4, 5.0, 3.0, 
        "R4", 2.3, 4.6, 4.4, 3.8, 
        "R5", 3.9, 4.8, 4.2, 3.3)

long_df <- pivot_longer(wide_df, contains("rating"), 
                        names_to = "rating_type",
                        names_pattern = "(.+)_rating",
                        values_to = "rating")

# abund_df <- tribble(~Site, ~`Soil dry mass`, ~`Moss`, ~Alopcune, ~Arctlute, ~Pardpull, ~Trocterr, ~Zoraspin,
#                     1, 2.3321, 3.0445, 10, 0, 45, 57, 4,
#                     2, 3.0493, 1.0986, 2, 0, 37, 65, 9,
#                     3, 2.5572, 2.3979, 20, 0, 45, 66, 1)
# 
table_styling <- function(df, ...) {
  kable(df, ..., booktabs = TRUE, linesep = "") %>%
    kable_classic()
}
# 
# data("spider", package = "mvabund")

data(SnowGR, package = "mosaicData")

Summary

The layered grammar of graphics [@Wickham2010-kt], implemented as the ggplot2 package [@Wickham2016] in the statistical language R [@rstats], is a powerful and popular tool to create versatile statistical graphics. However, this graphical system requires input data to be organised in a manner that a data column is mapped to an aesthetic element (e.g., x-coordinate, y-coordinate, color, size), which creates friction in constructing plots with an aesthetic element that spans multiple columns in the original data, as it requires the user to reorganise the data.
<!-- FKCH: Should we use the words long and wide format data frames here, or is that wrong? I only ask because I (maybe others?) think about ggplot requiring long format data, but we want to wide formatting?

XL: Regarding Francis' question, I don't know if the long format or wide format data are well defined in the literature. If it is, we definitely can mention it. If not, do we need to define it? --commented by XL

FKCH: @Emi you know the literature best. Can you make a decision on this? Thanks!

ET: I don't think we should mention long/wide -- there's more nuance than that. -->

The ggmatplot, built upon ggplot2, is an R-package that allows quick plotting across the columns of matrices or data with the result returned as a ggplot object. The package is inspired by the function matplot() in the core R graphics system -- as such, ggmatplot may be considered as a ggplot version of matplot, with the benefits of customising the plots as any other ggplot objects via ggplot2 functions, as well as offering several other plotting types that are not immediately available from matplot directly e.g., comparative violin plots. The ggmatplot package is available on the Comprehensive R Archive Network (CRAN) with the latest developmental source code available at \url{https://github.com/xuan-liang/ggmatplot}.

Statement of need

Input data to construct plots with ggplot2 require data to be organised in a manner that maps data columns to aesthetic elements. This generally works well where data is tidied in a long rectangular form, often referred to as "tidy data" [@Wickham2014-gy], where each row represents an observational unit, each column represents a variable, and each cell represents a value. However, in some cases what constitutes a variable (or observational unit), and hence a column (or row), in tidy data can be dependent upon interpretation or downstream interest (e.g. Tables \ref{tab:tab1} and \ref{tab:tab2} can both be considered as tidy data), but a clear violation of tidy data principles is when the column names contain data values, e.g. Table \ref{tab:tab3} contains months of the year across a number of column names.

table_styling(wide_df, col.names = c("Restaurant", "Food",
                                     "Service", "Ambience", "Overall"),
              caption = 'Restaurant rating data in "tidy" form. The first column shows the restaurant ID, and the next four columns show the average ratings (out of 5) for food, service, ambience and overall, respectively.') %>% 
  add_header_above(c(" ", "Average rating" = 4))

table_styling(long_df, col.names = c("Restauant", "Rating type", "Average rating"),
              caption = 'Another form for the restaurant rating data in Table \\ref{tab:tab1}. In Wickham (2014), this format is called the "molten" data.')

head(SnowGR[, 1:11]) %>% 
  mutate(Jan = as.character(Jan)) %>% 
  replace_na(list(Jan = ".")) %>% 
  table_styling(caption = 'The first 6 rows and 11 columns of the snowfall data for Grand Rapids, Michigan in the R pacakge  \\texttt{mosaicData} (Prium, Kaplan \\& Horton, 2021).',
                align = "r")

The organisation of the data is largely dependent on the subsequent analysis, and there is no one correct way to do this. Some forms of multivariate data, e.g. Table \ref{tab:tab3}, are prevalent in many scientific fields because it aligns with the input data for a particular modelling software, and/or the format is more convenient for input or view of the data in spreadsheet format (say). Unfortunately, this format is not consistent with the required format for ggplot2, and consequently plotting with ggplot2 interrupts the workflow of a user that is trying to quickly visualise these types of data (as part of their exploratory data analysis, for example). The ggmatplot R-package seeks to provide a solution to this common friction.

Examples

In this section, we demonstrate the use of the ggmatplot package and contrast the specification with ggplot2 after data wrangling using dplyr and tidyr [@Wickham2019]. We will use the example data in Tables \ref{tab:tab1} and \ref{tab:tab3}, which are stored in the objects wide_df and SnowGR, respectively.

Example 1

The code below constructs a line plot (superimposed with a point) of the various types (food, service and ambience) of ratings, contained in columns 2 to 4 of wide_df, against the overall rating in column 5 of wide_df as shown in Figure 1.

library(ggmatplot)
ggmatplot(x = wide_df[, 5], y = wide_df[, 2:4], plot_type = "both",
          xlab = "Overall rating",  ylab = "Rating", legend_title = "Type")

In contrast to the above, using ggplot2 alone to obtain the same result as Figure 1 requires the data be wrangled to a long form first before plotting. This is exemplified in the code below, from which we identify a small, but noticeable, friction to the workflow for the practitioner that is looking to promptly explore their data.

library(ggplot2)
library(tidyr) # or library(tidyverse)
wide_df %>% 
  select(contains("rating")) %>% 
  pivot_longer(-overall_rating, 
               names_to = "rating_type",
               values_to = "rating") %>% 
  mutate(rating_type = fct_inorder(rating_type)) %>%
  ggplot(aes(x = overall_rating, y = rating, color = rating_type)) + 
  geom_point(aes(shape = rating_type)) +
  geom_line(aes(group = rating_type, linetype = rating_type)) +
  labs(x = "Overall rating", y = "Rating", 
       color = "Type", linetype = "Type", shape = "Type")

Example 2

The example code draws the boxplot of each column of amount of snowfall across months in the SnowGR data, as presented in Figure 2. As the resulting object is a ggplot object, the user can leverage the ggplot functions to modify the output (e.g., addition of a title).

library(ggmatplot)
ggmatplot(x = SnowGR[, 3:14], plot_type = "boxplot",
          xlab = "Month",  ylab = "Snowfall") + 
          ggtitle("Grand Rapids, Michigan, 1893-2011")

The equivalent code to produce Figure 2 without using ggmatplot is given below. Again, we observe a slight but non-negligible friction in putting the data in the right format prior to plotting. The original wide data format as seen in Table \ref{tab:tab3} is common in the environmental sciences, among other disciplines, and thus an analyst who has to repeat these tasks can benefit from a quick approach as ggmatplot offers.

library(ggplot2)
library(tidyr) 
library(forcats) # or library(tidyverse)
SnowGR %>% 
  pivot_longer(Jul:Jun, 
               names_to = "Month",
               values_to = "Snowfall") %>% 
  mutate(Month = fct_inorder(Month)) %>% 
  ggplot(aes(Month, Snowfall)) + 
  geom_boxplot(alpha = 0.5) + 
  ggtitle("Grand Rapids, Michigan, 1893-2011")

Discussion

The ggmatplot R-package provides a solution to a common friction encountered when wanting to quickly plot multivariate data, where the primary interest is mapping the column names as an aesthetic element. While an important start, we also acknowledge that solution provided is a recipe-driven approach, where the user can only produce plot types as many there are included in the plot_type option. Future developments of the package could benefit from using a grammar approach, like in @Wilkinson2005-oz and @Wickham2010-kt, where plot types can be extensible. The latest developmental source code can be found at \url{https://github.com/xuan-liang/ggmatplot}. Further examples can be found at \url{https://xuan-liang.github.io/ggmatplot/}.

Acknowledgements

FKCH was supported by an Australian Research Council Discovery Fellowship DE200100435.

References

xuan-liang/ggmatplot documentation built on Jan. 20, 2025, 8:33 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

xuan-liang/ggmatplot
Plot Columns of Two Matrices Against Each Other Using 'ggplot2'

In xuan-liang/ggmatplot: Plot Columns of Two Matrices Against Each Other Using 'ggplot2'

Summary

Statement of need

Examples

Example 1

Example 2

Discussion

Acknowledgements

References

R Package Documentation

Browse R Packages

We want your feedback!

xuan-liang/ggmatplot Plot Columns of Two Matrices Against Each Other Using 'ggplot2'

In xuan-liang/ggmatplot: Plot Columns of Two Matrices Against Each Other Using 'ggplot2'

Summary

Statement of need

Examples

Example 1

Example 2

Discussion

Acknowledgements

References

R Package Documentation

Browse R Packages

We want your feedback!

xuan-liang/ggmatplot
Plot Columns of Two Matrices Against Each Other Using 'ggplot2'