We'll start by loading the necessary packages and data sets
library("dplyr") library("ggplot2") data(okcupid, package = "jrTidyverse")
In this section, we will gradually chain the commands together. We'll start things off, by calculating the average income
new_data = okcupid %>% summarise(ave_income = mean(income)) new_data
okcupid %>% summarise(ave_income = mean(income), med_income = median(income))
group_by()
to calculate the mean income conditional on the answer
to the pets
question.okcupid %>% group_by(pets) %>% summarise(ave_income = mean(income))
The arrange()
function is used to sort a tibble, .e.g
r
... %>%
arrange(ave_income)
will arrange the tibble from smallest to largest. Arrange the tibble from largest
to smallest in terms of average income. Remember, you can look up the help page using ?arrange
.
(df = okcupid %>% group_by(pets) %>% summarise(ave_income = mean(income)) %>% arrange(desc(ave_income)) )
geom_col()
plot your results. Hint use + coord_flip()
to rotate
your plot.ggplot(df) + geom_col(aes(x = pets, y = ave_income)) + coord_flip()
mutate()
The floor()
function rounds down to the nearest integer. To round to the nearest $10$, we use the trick
r
floor(61 / 10) * 10
floor(119 / 10) * 10
Use the mutate()
function to create a new column that contains the persons
age (to the decade), i.e. 50, 60, 70, etc.
r
okcupid %>%
mutate(decade = floor(age / 10) * 10)
Since this data set has high earners, use filter()
to remove the
top 5\% of earners. Hint: quantile(income, probs = 0.95)
will give you the
95\%-tile of income.
okcupid %>% mutate(decade = floor(age / 10) * 10) %>% filter(income < quantile(income, probs = 0.95))
decade
column into a character using the
as.character()
function. This can be achieved via mutate(decade = as.character(decade))
(df = okcupid %>% mutate(decade = floor(age / 10) * 10) %>% filter(income < quantile(income, probs = 0.95)) %>% mutate(decade = as.character(decade)) )
x = decade
and y = income
.ggplot(df) + geom_boxplot(aes(x = decade, y = income))
+ facet_wrap(~ drugs)
ggplot(df) + geom_boxplot(aes(x = decade, y = income)) + facet_wrap(~drugs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.