knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Sample a day of transactions by user
Note that date must be formatted as below for now. YYYY-MM-DD
Note that we source a local R script that defines the database connection details.
library(DBI) library(dbplyr) library(dplyr) library(clpr) source("~/.keys/rs.R") rs <- connect_rs()
First, let's pull a sample of transactions in a given day using sample_day_of_transactions
.
date <- "2016-04-25" transactions_df <- sample_day_of_transactions(rs,date,n_users=100)
Use the as_rides
function to change the unit of observation from transactions to rides, where a ride is a ride on an operator.
rides_df <- as_rides(transactions_df)
We can also create a dataframe summarizing transfers within a given time window (in minutes), using the create_transfer_df
function.
transfer_df <- create_transfer_df(rides_df, 120) #120 minutes head(transfer_df)
Alternatively, we can use the as_bart_journeys
function to change the unit of observation from transactions to rides on BART only, with additional information about the rides that individuals may have taken before or after boarding BART. For example, taking a ferry and then BART.
bart_od <- as_bart_journeys(transactions_df)
The outcome includes the time of the previous transaction to BART tag-on. For example, a user tagged off of the ferry at 7:05 and then onto BART at 7:20. Or, a user tagged onto an SF Muni bus at 7:00 and then onto BART at 7:30. It also includes the time they tagged onto the following ride.
We can use the convenience function spread_time_column
to spread the timestamp column into day of year, month, hour, and minute integers.
out_time_df <- spread_time_column(bart_od$transaction_time, prefix="tag_out_") in_time_df <- spread_time_column(bart_od$time_of_previous, prefix="tag_on_") bart_od_nicetime <- cbind(bart_od,in_time_df,out_time_df)
This can make working with the time data easier. For example, plotting a histogram of the tag on hour.
hist(bart_od_nicetime$tag_on_hour, breaks=24)
We can also pull a full day of transactions using day_of_transactions
.
rs <- connect_rs() date <- "2016-04-25" transactions_tbl <- day_of_transactions(rs,date) transactions_df <- as_tibble(transactions_tbl) time_df <- spread_time_column(transactions_df$transaction_time, prefix="trnsct_") transactions_df <- cbind(transactions_df,time_df)
hist(transactions_df$trnsct_hour, breaks=24)
Then we can calculate the average number of transactions per product type in the day:
rides_df <- as_rides(transactions_tbl) rides_df <- get_product_description(rides_df) rides_per_user <- rides_df %>% group_by(cardid_anony,product_description) %>% transmute(total_rides=n()) rides_per_type <- rides_per_user %>% group_by(product_description) %>% summarise(mean_rides=mean(total_rides)) knitr::kable(head(arrange(rides_per_type,desc(mean_rides))))
You can contribute code, data, or questions. Please feel free to open an issue with any questions about how to use the package.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.