suppressPackageStartupMessages(library(rgdal)) suppressPackageStartupMessages(library(ggplot2)) suppressPackageStartupMessages(library(dplyr))
Here we have data provided by William Henderson, containing data from 5 riders.
bikeroutes.shapefile <- readOGR("../data/RIDE/william-trips-export.json", "OGRGeoJSON", p4s="+proj=tmerc +ellps=WGS84") bikeroutes.df <- fortify(bikeroutes.shapefile) head(bikeroutes.df) ggplot(bikeroutes.df, aes(x=long, y=lat, group=group)) + geom_path()
Okay. The data produced by fortifying the bikeroutes dataframe is rather strange.
barplot(table(bikeroutes.df$order)) barplot(table(bikeroutes.df$piece)) barplot(table(bikeroutes.df$group))
Group has 8192 levels, but there were originally 7712 lines. How do we know which ride is which? The group has a strange system, where the numbers are zero-indexed and the decimal place corresponds to the piece. So to get an ID we can just
bikeroutes.df$group %>% as.character() %>% as.numeric() %>% ceiling() %>% summary() bikeroutes.df$id <- bikeroutes.df$group %>% as.character() %>% as.numeric() %>% ceiling()
Really we need two dataframes: one for all the points that make up the paths and one just with an observation for each route.
rides <- bikeroutes.shapefile@data %>% select(rating, rating_text, owner__pk, original_trip_length, created, pk, rr_transformation) %>% # No need to have the actual primary keys here. Not good for security anyways. mutate(owner__pk = as.numeric(owner__pk), pk = as.numeric(pk)) %>% tbl_df()
Okay, but now we have another weird issue.
head(rides) head(bikeroutes.df %>% filter(order == 1))
It seems like we have less data than we thought. It seems that there are four
observations of every ride. One variable rr_transformation
seems to explain
on dimension of this duplication. This does explain why there are only 1928
primary keys and 7712 observations:
$$1928 \times 4 = 7712.$$
# Add id to rides dataframe rides$id <- 1:nrow(rides) # Give a number to each of copies rides$version <- as.factor(rep(c(1,2,3,4), 1928)) # Join over to paths rides.final <- bikeroutes.df %>% inner_join(rides, by="id") # Plot the different versions rides.final %>% ggplot(aes(x = long, y = lat, group = group)) + geom_path(alpha=0.3, lineend = "butt") + coord_map(projection = "mercator", ylim = c(45.5, 45.525), xlim = c(-122.68, -122.64)) + facet_wrap(~ version)
Okay, so the version I actually want are the 'simplify' versions. Let's select version 2.
rides.final <- rides.final %>% filter(version == "2")
This seems to include locations outside of Portland, OR. So let's filter this data.
not.in.portland <- rides.final %>% filter(lat < 45.462 | lat > 45.549 | long > -122.577 | long < -122.722) %>% distinct(pk) %>% .$pk rides.final %>% filter(!pk %in% not.in.portland) %>% ggplot(aes(x = long, y = lat, group = group)) + geom_path(alpha=0.3, aes(col=rating), lineend = "butt") + coord_map(projection = "mercator") + scale_color_gradient2(low = "yellow", mid = "green", high = "red")
That's quite a bit of coverage for only five people!
How do they seem to rate their rides?
rides %>% ggplot(aes(x = rating_text)) + geom_histogram() + facet_grid(~owner__pk)
Alright let's save this data for now.
save(rides.final, file = "../data/bikeroutes.RData") save(rides, file = "../data/rides.RData")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.