knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The explore package simplifies Exploratory Data Analysis (EDA). Get faster insights with less code! We will use < 10 lines of code and just 6 function names to explore penguins:
| function | package | description |
|------------------|-----------|-----------------------------------------|
| library()
| {base} | load a package |
| filter()
| {dplyr} | subset rows using column values |
| describe()
| {explore} | describe variables of the table |
| explore()
| {explore} | explore graphically a variable |
| explore_all()
| {explore} | explore all variables of the table |
| explain_tree()
| {explore} | explain a target using a decision tree |
The penguins
dataset comes with the palmerpenguins package. It has 344 observations and 8 variables. (https://github.com/allisonhorst/palmerpenguins)
Furthermore, we use the packages {dplyr} for filter()
and %>%
and {explore} for data exploration.
library(dplyr) library(explore) penguins <- use_data_penguins() # equivalent to # penguins <- palmerpenguins::penguins
penguins %>% describe()
There are some NA
-values (unknown values) in the data. The variable containing the most NAs is sex. flipper_length_mm and others contain only 2 observations with NAs.
We use only penguins with known flipper length for the data exploration!
data <- penguins %>% filter(flipper_length_mm > 0)
We reduced the penguins from 344 to 342.
data %>% explore_all(color = "skyblue")
What is the relationship between all the variables and species?
data %>% explore_all( target = species, color = c("darkorange", "purple", "lightseagreen"))
We already see some strong patterns in the data. flipper_length_mm
separates species Gentoo, bill_length_mm
separates species Adelie from Chinstrap. And we see that Chinstrap and Gentoo are located on separate islands.
Now we explain species using a decision tree:
data %>% explain_tree(target = species)
We found an easy explanation how to find out the species by just using flipper_length_mm and bill_length_mm.
flipper_legnth_mm >= 207
, it is a Gentoo penguin (95% right)flipper_length_mm < 207
and bill_length_mm < 43
, it is a Adelie penguin (97% right)flipper_length_mm < 207
and bill_length_mm >= 43
, it is a Chinstrap penguin (92% right)Now let's take a closer look to these variables:
data %>% explore( flipper_length_mm, bill_length_mm, target = species, color = c("darkorange", "purple", "lightseagreen") )
The plot shows a not perfect but good separation between the 3 species!
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.