We've already covered a fair bit of tidyverse stuff in the Intro to R course. Namely tibbles, dplyr and ggplot2. This chapter just serves as a general recap into what we touched upon.
A tibble (or data frame) is how we store a sheet of data. A standard tibble looks like this
data(example, package = "jrTidyverse") example
dplyr is a package for manipulating tibbles. We covered several functions, such as filter()
and summarise()
library("dplyr") ## Give me all the rows where gender is "Male" filter(example, gender == "Male") ## What is the average of the age variable summarise(example, av_age = mean(age))
age
> 24
. filter(example, age > 24)
respond
= TRUE
. filter(example, respond)
We can pass outputs to the first argument of the next function using the piping operator, %>%
# Give me the average age of males example %>% filter(gender == "Male") %>% summarise(av_age = mean(age))
example %>% filter(!respond) %>% summarise(av_age = mean(age))
The piping operator can be used in any functions, not just dplyr
# Pass 1:5 on the left as the first argument to mean 1:5 %>% mean(na.rm = TRUE) # Explicitly pass 1:5 into the function mean(1:5, na.rm = TRUE)
We can apply functions to groups within variables using group_by()
# Give me the average age of each group within gender example %>% group_by(gender) %>% summarise(av_age = mean(age))
example %>% group_by(respond) %>% summarise(av_age = mean(age))
ggplot2 is a fantastic package for graphics. The ggplot()
function creates a ggplot2 object.
library("ggplot2") data(movies, package = "jrTidyverse") ggplot(movies)
To add axes to this we add aesthetics
# visible in figure 2.1 g = ggplot(movies, aes(x = duration, y = rating)) g
Notice we can save plots as variables.
Then to add information onto the graph we use geoms
# figure 2.2 g + geom_point() ggplot(movies, aes(x = rating)) + geom_histogram() ggplot(movies, aes(x = classification)) + geom_bar() ggplot(movies, aes(x = classification, y = rating)) + geom_boxplot()
a = g + geom_point() b = ggplot(movies, aes(x = rating)) + geom_histogram() c = ggplot(movies, aes(x = classification)) + geom_bar() d = ggplot(movies, aes(x = classification, y = rating)) + geom_boxplot() gridExtra::grid.arrange(a, b, c, d, ncol = 2, nrow = 2)
movies %>% group_by(year) %>% summarise(av_rat = mean(rating))
geom_line()
movies %>% group_by(year) %>% summarise(av_rat = mean(rating)) %>% ggplot(aes(x = year, y = av_rat)) + geom_line()
labs()
movies %>% group_by(year) %>% summarise(av_rat = mean(rating)) %>% ggplot(aes(x = year, y = av_rat)) + geom_line() + labs(x = "Year", y = "Average Rating", title = "Average rating per year of movies")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.