knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE )
library(slimreda)
Slimeda focuses on unique value and missing value counts, as well as making graphs like histogram and correlation graphs. Also, the generated results are designed as charts or images, which will help users more flexibly reference their EDA results.
Let's explore the functions in slimreda
one at a time.
Suppose you would like to plot the distrubtion of certain columns in your data frame as histograms. Instead of writing multiple code chunks with duplicate ggplot
code, you can use the histogram
function to plot histograms for as many columns as you would like.
In the example below, we generate two histograms for two columns in the penguins
data frame, namely body_mass_g
and flipper_length_mm
. We use plot_grid
to render these plots on the same row, but you can plot them directly.
library(palmerpenguins) library(cowplot) hist_plots <- slimreda::histogram(penguins, c('body_mass_g', 'flipper_length_mm')) cowplot::plot_grid(plotlist = hist_plots, nrow = 1)
With this function, you can know the number of missing values and corresponding percentage for a data frame. There are two parameters: df is the data frame you want to analyze, and ascending is a boolean value to decide whether the df is sorted ascending or decending.
Below is an example for this function:
example_miss_count <-data.frame( name = c(NA,NA,"Jessica"), age = c(NA,21,30), hobby = c("lab","quiz","swim") ) output <- slimreda::miss_count(example_miss_count, ascending = TRUE) output
The cat_unique_count
comes in handy when you are interested in the number(s) of unique values you have in every categorical column in your data frame. With this function, you can skip duplicating the same line of code only to edit the column name and have all the categorical features and unique value counts returned as a data frame.
In the example below, we generate the unique value counts for all categorical features in the penguins
data frame, namely species
, island
and sex
. We use knitr::kable
to render the data frame into a table.
unique_cat_df <- slimreda::cat_unique_count(penguins) knitr::kable(unique_cat_df, "simple")
Now suppose you would like to see the correlation between some columns in your data frame as in a correlation map, showing the pairwise correlation strength, instead of writing lines of duplicate ggplot
code you can use the corr_map
function from the multiple code chunks with duplicate ggplot
code, you can use the histogram
function to plot histograms for as many columns as you would like.
In the example below, we generate a simple correlation map for all the numeric columns in the penguins
data frame. The color indicates the correlation between -1 to 1 and the output is a ggplot
object that can be modified later.
corr_map_plot <- slimreda::corr_map(penguins, colnames(penguins)) corr_map_plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.