As usual, let's load the packages and data needed for this practical.

library("dplyr")
library("lubridate")
library("ggplot2")
data(okcupid, package = "jrTidyverse")

When were you born? (you can lie if you want to)

  1. Store your birth date as a character variable i.e.

    r bday = "11/04/1967"

  2. Convert it into a date object using dmy

    r bday = dmy(bday)

  3. Which day of the week were you born on? Hint: Use wday(). Notice R returns the weekday as a number. To clarify this, set the argument label equal to TRUE inside wday.

    r wday(bday, label = TRUE)

  4. Using the year() function, change the year of your date object to your next birthday. What day is that on?

year(bday) = 2018
wday(bday, label = TRUE)
  1. How many days is it until your next birthday? What about seconds since you were born? Hint: Use interval then use the unit argument inside as.period()
today = today()
as.period(interval(today, bday), unit = "year")
as.period(interval(today, bday), unit = "day")
as.period(interval(today, bday), unit = "seconds")

OKCupid

Take our OKcupid data, let's say we want to look at the distribution of the weekday of people's last online time. Effectively asking the question "Which day of the week do people use OKCupid most on?"

  1. Using mutate() and ymd_hms() convert the last_online column to a proper date. Hint, remember to set the time zone in the ymd_hms() via tz = "America/Los_Angeles".
okcupid = okcupid %>%
  mutate(last_online = ymd_hms(last_online, tz = "America/Los_Angeles"))
  1. Create a new column called week_day that contains the day of the week a user accessed OKCupid. Hint: use mutate() and wday()
okcupid = okcupid %>%
    mutate(week_day = wday(last_online, label = TRUE))
  1. Create a bar chart of the day of the week using geom_bar(). Which days are most popular?
ggplot(okcupid, aes(x = week_day)) +
  geom_bar() +
  xlab("Week day") +
  ylab("Count")
# friday and saturday are the two most popular
  1. How does this compare for men and women?
# either use a graph to find out
ggplot(okcupid, aes(x = week_day)) +
  geom_bar() +
  xlab("Week day") +
  ylab("Count") +
  facet_wrap(~sex)

# or a summary data frame

okcupid %>%
  group_by(sex) %>%
  count(week_day)
  1. Create a bar chart showing the distribution for the hour of the day okcupid users were last online? You should end up with something like the figure below
okcupid = okcupid  %>%
    mutate(lo_hour = hour(last_online))
ggplot(okcupid, aes(x = lo_hour)) +
  geom_bar() +
  xlab("Hour of the day") +
  ylab("Count")


jr-packages/jrTidyverse documentation built on Oct. 11, 2020, 9:03 p.m.