knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
We’re going to work with the passerines database, a dataset of external morphology measurements of approximately one-quarter of passerine bird species taken from Ricklefs 2017.
E. Ricklefs, Robert (2017): Passerine morphology: external measurements of approximately one-quarter of passerine bird species. Ecology, 98:1472. 10.1002/ecy.1783.
First we’ll load the data:
library(traitdata) data(passerines)
Next we’ll load the dplyr package:
library(dplyr)
Looking at the data
Data frames look a bit different in dplyr. Above, I called the tbl_df() function on our data. This provides more useful printing of data frames in the console. Ever accidentally printed a massive data frame in the console before? Yeah… this avoids that. You don’t need to change your data to a data frame tbl first — the dplyr functions will automatically convert your data when you call them. This is what the data look like on the console:
tbl_df(passerines)
dplyr also provides a function glimpse() that makes it easy to look at our data in a transposed view. It’s similar to the str() (structure) function, but has a few advantages (see ?glimpse).
glimpse(passerines)
Selecting columns
select() lets you subset by columns. This is similar to subset() in base R, but it also allows for some fancy use of helper functions such as contains(), starts_with() and, ends_with(). I think these examples are self explanatory, so I’ll just include them here:
select(passerines, Length, Wing, Tail) %>% head() select(passerines, HN:AI) %>% head() select(passerines, 1:3) %>% head()
Filtering rows
filter() lets you subset by rows. You can use any valid logical statements:
filter(passerines, Length > 30)[ , 1:3] %>% head() filter(passerines, scientificNameStd == "Passer domesticus") %>% head() filter(passerines, Length > 30 & AU == 1) %>% head()
Arranging rows
arrange() lets you order the rows by one or more columns in ascending or descending order. I’m selecting the first three columns only to make the output easier to read:
arrange(passerines, Length)[ , 1:3] %>% head() arrange(passerines, desc(Length))[ , 1:3] %>% head() arrange(passerines, Length, Wing, Tail)[, 1:3] %>% head()
Summarising columns
Finally, summarise() lets you calculate summary statistics. On its own summarise() isn’t that useful, but when combined with group_by() you can summarise by chunks of data. This is similar to what you might be familiar with through ddply() and summarise() from the plyr package:
summarise(passerines, mean_length = mean(Length, na.rm = TRUE)) head(summarise(group_by(passerines, Family), mean_length = mean(Length, na.rm = TRUE)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.