Data Manipulation

This section will hopefully help you get more comfortable with some of the dplyr functionality for "wrangling" your data. We will do some data wrangling and the use that to create some plots. Make sure you load in the dplyr package and the movies data set.

library("dplyr")
data("movies", package = "jrIntroduction")
  1. We want to look at how film budgets for films in English have changed over time for both the Comedy and non Comedy films. To start with, we should filter the data set such that it only contains films spoken in English. Try using the %>% notation, i.e. movies %>% filter(...)

    r movies %>% filter(language == "English")

  2. We want to look at comedy and non comedy films in each year, this is some sort of grouping structure which suggests use of the group_by() function. Create this grouping structure on the filtered movies data

    r movies %>% filter(language == "English") %>% group_by(year, comedy)

  3. Use the summarise() function to calculate the average budget in each year for both comedy and non comedy films.

    r movies %>% filter(language == "English") %>% group_by(year, comedy) %>% summarise(b = mean(budget))

Question 2

Run the folowing R code:

data(USnames, package = "jrIntroduction")

\noindent The tibble USnames is a collection of names given to babies born in the US between 2011 and 2014.

  1. Make sure you are comfortable with what the data looks like using head() and colnames().

    r head(USnames) colnames(USnames)

  2. How many children were born in 2012? Hint: use filter() then summarise()

    r filter(USnames, Year == 2012) %>% summarise(sum(Count))

  3. Were more male or females children born during the four years? Hint: use group_by()

    ```r USnames %>% group_by(Gender) %>% summarise(sum(Count))

    more males

    ```

  4. Tricky: How many names in 2011 were used less than 10 times? Hint: names can be used for both male and female.

    r USnames %>% filter(Year == 2011) %>% group_by(Name) %>% summarise(Count = sum(Count)) %>% filter(Count < 10) %>% count()

Solutions

Solutions to the practical questions are contained within the package

vignette("solutions4", package = "jrIntroduction")


jr-packages/jrIntroduction documentation built on May 24, 2020, 5:19 p.m.