knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(lubridate)

Taxi_trips (from the statPREP package) is a subset of a much larger data set from the New York City Taxi and Limosine Commission about taxicab trips in New York City.

The unit of observation is a single trip by taxi. The recorded variables are pickup and dropoff date-times, distance of the trip (in miles), the fare amount (in dollars), and the number of passengers. From these have been derived the duration of the trip (in minutes), the hour_of_day of the start of the trip (with, e.g. 16.5 meaning 4:30pm), and the day_of_week of the trip. Also added is the fare_distance, which is the distance rounded up to the nearest 5th of a mile.

Some possible questions:

load("Taxi_trips.rda")
names(Taxi_trips) <- c("distance", "fare", "passengers", "pickup", "dropoff")
Taxi_trips <- Taxi_trips %>%
  mutate(duration = as.numeric(dropoff - pickup) /60,
         day_of_week = lubridate::wday(pickup, label = TRUE),
         hour_of_day = lubridate::hour(pickup) + 
           lubridate::minute(pickup) / 60,
         fare_distance = ceiling(distance * 5) / 5)
save(Taxi_trips, file = "../../data/Taxi_trips.rda")


dtkaplan/statPREPpackage documentation built on May 15, 2019, 5:22 p.m.