We'll start by loading the necessary packages and data sets

library("dplyr")
library("ggplot2")
data(okcupid, package = "jrTidyverse")

Summarising the data

In this section, we will gradually chain the commands together. We'll start things off, by calculating the average income

new_data = okcupid %>%
  summarise(ave_income = mean(income))
new_data
  1. Alter the above command to calculate the median income (as well as the mean).
okcupid %>%
  summarise(ave_income = mean(income),
            med_income = median(income))
  1. Use the group_by() to calculate the mean income conditional on the answer to the pets question.
okcupid %>%
  group_by(pets) %>%
  summarise(ave_income = mean(income))
  1. The arrange() function is used to sort a tibble, .e.g

    r ... %>% arrange(ave_income) will arrange the tibble from smallest to largest. Arrange the tibble from largest to smallest in terms of average income. Remember, you can look up the help page using ?arrange.

(df = okcupid %>%
  group_by(pets) %>%
  summarise(ave_income = mean(income)) %>%
  arrange(desc(ave_income))
)
  1. Using ggplot2 and geom_col() plot your results. Hint use + coord_flip() to rotate your plot.
ggplot(df) +
  geom_col(aes(x = pets, y = ave_income)) +
  coord_flip()

Creating columns with mutate()

  1. The floor() function rounds down to the nearest integer. To round to the nearest $10$, we use the trick r floor(61 / 10) * 10 floor(119 / 10) * 10 Use the mutate() function to create a new column that contains the persons age (to the decade), i.e. 50, 60, 70, etc.

    r okcupid %>% mutate(decade = floor(age / 10) * 10)

  2. Since this data set has high earners, use filter() to remove the top 5\% of earners. Hint: quantile(income, probs = 0.95) will give you the 95\%-tile of income.

okcupid %>%
  mutate(decade = floor(age / 10) * 10) %>%
  filter(income < quantile(income, probs = 0.95))
  1. To help with plotting, convert the decade column into a character using the as.character() function. This can be achieved via mutate(decade = as.character(decade))
(df = okcupid %>%
  mutate(decade = floor(age / 10) * 10) %>%
  filter(income < quantile(income, probs = 0.95)) %>%
  mutate(decade = as.character(decade))
)
  1. Use ggplot2 to create boxplots of x = decade and y = income.
ggplot(df) +
  geom_boxplot(aes(x = decade, y = income))
  1. Create facets by using + facet_wrap(~ drugs)
ggplot(df) +
  geom_boxplot(aes(x = decade, y = income)) +
  facet_wrap(~drugs)


jr-packages/jrTidyverse documentation built on Oct. 11, 2020, 9:03 p.m.