We'll start by loading the necessary packages and data sets
library("dplyr") library("ggplot2") data(okcupid, package = "jrTidyverse")
In this section, we will gradually chain the commands together. We'll start things off, by calculating the average income
new_data = okcupid %>% summarise(ave_income = mean(income)) new_data
okcupid %>% summarise(ave_income = mean(income), med_income = median(income))
group_by() to calculate the mean income conditional on the answer
to the pets question.okcupid %>% group_by(pets) %>% summarise(ave_income = mean(income))
The arrange() function is used to sort a tibble, .e.g
r
... %>%
arrange(ave_income)
will arrange the tibble from smallest to largest. Arrange the tibble from largest
to smallest in terms of average income. Remember, you can look up the help page using ?arrange.
(df = okcupid %>% group_by(pets) %>% summarise(ave_income = mean(income)) %>% arrange(desc(ave_income)) )
geom_col() plot your results. Hint use + coord_flip() to rotate
your plot.ggplot(df) + geom_col(aes(x = pets, y = ave_income)) + coord_flip()
mutate()The floor() function rounds down to the nearest integer. To round to the nearest $10$, we use the trick
r
floor(61 / 10) * 10
floor(119 / 10) * 10
Use the mutate() function to create a new column that contains the persons
age (to the decade), i.e. 50, 60, 70, etc.
r
okcupid %>%
mutate(decade = floor(age / 10) * 10)
Since this data set has high earners, use filter() to remove the
top 5\% of earners. Hint: quantile(income, probs = 0.95) will give you the
95\%-tile of income.
okcupid %>% mutate(decade = floor(age / 10) * 10) %>% filter(income < quantile(income, probs = 0.95))
decade column into a character using the
as.character() function. This can be achieved via mutate(decade = as.character(decade))(df = okcupid %>% mutate(decade = floor(age / 10) * 10) %>% filter(income < quantile(income, probs = 0.95)) %>% mutate(decade = as.character(decade)) )
x = decade and y = income.ggplot(df) + geom_boxplot(aes(x = decade, y = income))
+ facet_wrap(~ drugs)ggplot(df) + geom_boxplot(aes(x = decade, y = income)) + facet_wrap(~drugs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.