In UBC-MDS/slimreda: Exploratory Data Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE
)

library(slimreda)

Slimeda focuses on unique value and missing value counts, as well as making graphs like histogram and correlation graphs. Also, the generated results are designed as charts or images, which will help users more flexibly reference their EDA results.

Let's explore the functions in slimreda one at a time.

Histogram

Suppose you would like to plot the distrubtion of certain columns in your data frame as histograms. Instead of writing multiple code chunks with duplicate ggplot code, you can use the histogram function to plot histograms for as many columns as you would like.

In the example below, we generate two histograms for two columns in the penguins data frame, namely body_mass_g and flipper_length_mm. We use plot_grid to render these plots on the same row, but you can plot them directly.

library(palmerpenguins)
library(cowplot)

hist_plots <- slimreda::histogram(penguins, c('body_mass_g', 'flipper_length_mm'))

cowplot::plot_grid(plotlist = hist_plots, nrow = 1)

Miss_count

With this function, you can know the number of missing values and corresponding percentage for a data frame. There are two parameters: df is the data frame you want to analyze, and ascending is a boolean value to decide whether the df is sorted ascending or decending.

Below is an example for this function:

example_miss_count <-data.frame(
        name = c(NA,NA,"Jessica"),
        age = c(NA,21,30),
        hobby = c("lab","quiz","swim")
)

output <- slimreda::miss_count(example_miss_count,
                               ascending = TRUE)

output

cat_unique_count

The cat_unique_count comes in handy when you are interested in the number(s) of unique values you have in every categorical column in your data frame. With this function, you can skip duplicating the same line of code only to edit the column name and have all the categorical features and unique value counts returned as a data frame.

In the example below, we generate the unique value counts for all categorical features in the penguins data frame, namely species, island and sex. We use knitr::kable to render the data frame into a table.

unique_cat_df <- slimreda::cat_unique_count(penguins)

knitr::kable(unique_cat_df, "simple")

Corr_map

Now suppose you would like to see the correlation between some columns in your data frame as in a correlation map, showing the pairwise correlation strength, instead of writing lines of duplicate ggplot code you can use the corr_map function from the multiple code chunks with duplicate ggplot code, you can use the histogram function to plot histograms for as many columns as you would like.

In the example below, we generate a simple correlation map for all the numeric columns in the penguins data frame. The color indicates the correlation between -1 to 1 and the output is a ggplot object that can be modified later.

corr_map_plot <- slimreda::corr_map(penguins, colnames(penguins))

corr_map_plot

UBC-MDS/slimreda documentation built on Feb. 7, 2022, 9:12 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

UBC-MDS/slimreda
Exploratory Data Analysis

In UBC-MDS/slimreda: Exploratory Data Analysis

Histogram

Miss_count

cat_unique_count

Corr_map

R Package Documentation

Browse R Packages

We want your feedback!

UBC-MDS/slimreda Exploratory Data Analysis

In UBC-MDS/slimreda: Exploratory Data Analysis

Histogram

Miss_count

cat_unique_count

Corr_map

R Package Documentation

Browse R Packages

We want your feedback!

UBC-MDS/slimreda
Exploratory Data Analysis