NYC's Taxi and Limousine Commission Trip Data is a collection of trip records including fields capturing pick-up and drop-off locations, times, trip distances, fares, rate types, and driver-reported passenger counts. The data was collected and provided to the NYC TLC by technology providers under the Taxicab & Livery Passenger Enhancement Programs.
To start using the package, load the 'nyctaxi' package into your R session. Since the 'nyctaxi' package currently lives on GitHub and not on CRAN, you have to install it using 'devtools'.
#install.packages("devtools") #devtools::install_github("beanumber/nyctaxi")
Two dataframes are included in this package: 'green_2016_01_sample' and 'yellow_2016_01_sample'. There are random samples of 1000 observations generated by the 'sample' function in base R from the 2016 January green and yellow taxi trip data.
library(nyctaxi) data(green_2016_01_sample) head(green_2016_01_sample)
To access data during wider time spans, make use of the 'etl' package to download the data and import it into a database. Please see the documentation for 'etl_extract' for further details and examples.
help("etl_extract.etl_nyctaxi")
The code below creates a directory on your local desktop and downloads NYC taxicab trip data from Janaury, 2016 to your local directory. It also transforms/cleans the data and loads it to a sqlite database.
taxi <- etl("nyctaxi", dir = "~/Desktop/nyctaxi/") taxi %>% etl_extract(years = 2016, months = 1, types = c("green")) %>% etl_transform(years = 2016, months = 1, types = c("green")) %>% etl_load(years = 2016, months = 1, types = "green")}
library(dplyr) library(leaflet) library(lubridate)
We can use leaflet
to visualize the pickup and dropoff locations of the 1000 trips in the green taxi trip dataset:
my_trips <- green_2016_01_sample #clean_up data according to date and time of pickup one_cab <- my_trips %>% filter(Pickup_longitude != 0) leaflet(data = one_cab) %>% addTiles() %>% addCircles(lng = ~Pickup_longitude, lat = ~Pickup_latitude) %>% addCircles(lng = ~Dropoff_longitude, lat = ~Dropoff_latitude, color = "green")
We can use lubridate
to clean datetime variable:
clean_datetime <- my_trips %>% mutate(lpep_pickup_datetime = ymd_hms(lpep_pickup_datetime)) %>% mutate(Lpep_dropoff_datetime = ymd_hms(Lpep_dropoff_datetime)) %>% mutate(weekday_pickup = weekdays(lpep_pickup_datetime)) %>% mutate(weekday_dropoff= weekdays(Lpep_dropoff_datetime))
We can now analyze the number of trips occurred on each day of a week:
clean_datetime %>% group_by(weekday_pickup) %>% summarize(N = n(), avg_dist = mean(Trip_distance), avg_passengers = mean(Passenger_count), avg_price = mean(Total_amount))
It looks like on Friday and Saturday had the most trips.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.