Explore penguins"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

How to explore the penguins dataset using the explore package.

The explore package simplifies Exploratory Data Analysis (EDA). Get faster insights with less code! We will use < 10 lines of code and just 6 function names to explore penguins:

| function | package | description | |------------------|-----------|-----------------------------------------| | library() | {base} | load a package | | filter() | {dplyr} | subset rows using column values | | describe() | {explore} | describe variables of the table | | explore() | {explore} | explore graphically a variable | | explore_all() | {explore} | explore all variables of the table | | explain_tree() | {explore} | explain a target using a decision tree |

The penguins dataset comes with the palmerpenguins package. It has 344 observations and 8 variables. (https://github.com/allisonhorst/palmerpenguins)

Furthermore, we use the packages {dplyr} for filter() and %>% and {explore} for data exploration.

library(dplyr)
library(explore)
penguins <- use_data_penguins()
# equivalent to 
# penguins <- palmerpenguins::penguins

Describe variables

penguins %>% describe()

There are some NA-values (unknown values) in the data. The variable containing the most NAs is sex. flipper_length_mm and others contain only 2 observations with NAs.

Data cleaning

We use only penguins with known flipper length for the data exploration!

data <- penguins %>% 
  filter(flipper_length_mm > 0)

We reduced the penguins from 344 to 342.

Explore variables

data %>% 
  explore_all(color = "skyblue")

Which species?

What is the relationship between all the variables and species?

data %>% 
  explore_all(
    target = species,
    color = c("darkorange", "purple", "lightseagreen"))

We already see some strong patterns in the data. flipper_length_mm separates species Gentoo, bill_length_mm separates species Adelie from Chinstrap. And we see that Chinstrap and Gentoo are located on separate islands.

Now we explain species using a decision tree:

data %>% explain_tree(target = species)

We found an easy explanation how to find out the species by just using flipper_length_mm and bill_length_mm.

Now let's take a closer look to these variables:

data %>% 
  explore(
    flipper_length_mm, bill_length_mm, 
    target = species,
    color = c("darkorange", "purple", "lightseagreen")
    )

The plot shows a not perfect but good separation between the 3 species!



Try the explore package in your browser

Any scripts or data that you put into this service are public.

explore documentation built on Sept. 11, 2024, 7:40 p.m.