Exploratory Data Analysis is one of the key steps in a machine learning
project. This package aims to make this process easy by providing R
functions built on the ggplot2
package to plot four common types of
plots with the magma color scheme. To maximize interpretability, the
plots have defined color schemes (discrete, diverging, sequential) based
on the kind of data they show.
The development version of the package can be installed from GitHub with:
# install.packages("devtools")
devtools::install_github("UBC-MDS/magmavizR")
An interactive version of the documentation can be found here.
The magmavizR library can be loaded by using the commands below:
library("magmavizR")
penguins_data <- palmerpenguins::penguins
The four data visualization functions included in the package along with the usage are outlined below:
Returns a boxplot based on the data frame, a numerical feature to view the distribution of and a categorical feature to bucket data into categories. Additionally, there is a boolean option to facet the boxplots into separate charts.
boxplot(penguins_data, species, bill_length_mm, facet = TRUE)
Returns a correlation plot based on the numerical features present in the data frame. Additionally, it will print the correlated numerical feature pairs along with their correlation values.
corrplot(penguins_data, print_corr = TRUE, title = "Correlation Plot")
Returns a histogram based on the data frame and a numeric feature to plot on the x-axis. The y-axis will display the result of the following aggregating functions:
count
ncount
density
ndensity
width
histogram(penguins_data, bill_length_mm, "..count..")
Returns a scatterplot based on the data frame and two numerical feature names passed as the required inputs. There are auxiliary inputs that provide the flexibility to:
Color code or change the shape of the data points on a categorical variable
Set a title to the plot, x-axis, y-axis and color legend
Change the opacity and size of the data points
Set the scale of the x-axis and y-axis to start from zero
scatterplot(penguins_data, bill_length_mm, flipper_length_mm, species, "Bill and Flipper length clusters by Species", 0.5, 2.5, "Bill length (mm)", "Flipper length (mm)", "", FALSE, FALSE, TRUE)
Our package will build onto the existing features of ggplot2
using the
magma color scheme. It serves as an automated plotter and is a higher
level implementation of it. Essentially it circumvents the need to code
every single detail and allows the user to focus on the output. We came
across two packages on CRAN that have a similar line of thought:
quickplot -
also a high level package based on ggplot2
that generates plots
modularly.
BoutrosLab.plotting.general - same motivation as this package, plots on a high level with a
standard format. It does not use ggplot2
however.
The primary contributors to this package are:
We welcome new ideas and contributions. Please refer to the contributing guidelines in the CONTRIBUTING.MD file. Do note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
magmavizR
was created by Abdul Moid Mohammed, Mukund Iyer, Irene Yan,
Rubén De la Garza Macías. It is licensed under the terms of the MIT
license.
magmavizR
was created using the tutorial in
The Whole Game
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.