knitr::opts_chunk$set(echo = TRUE)
# Note: move to vignettes folder eventually

Reading and cleaning data

Nikée has written an R package, TravelAIR, for working with the TAI data.

This function, for example, reads in the custom data format:

readData <- function(inputPath, minObsDate="2014-01-01",
                   maxObsDate="2016-10-01",
                   saveOutput=TRUE, outputPath=inputPath,
                   saveSummary=TRUE, summaryName="Aug14-16_Summary",
                   summaryPath=inputPath)
# example usage, 1 minute to procss to process 180 MB
# system.time(readData(inputPath = "tai-private-data/")) 

I/O results

**Original data** wzxhzdk:2 **The processed data** wzxhzdk:3

Cleaning the data

Extensible function for cleaning the data:

cleanData<-function(inputPath, inputName="Base",
                    methodSpecific=c("Car", "Bicycle"),
                    thresholdSpeed=c(rep(240,11),1200,240),
                    unrealMerge=FALSE,
                    removeMethodUnknown=FALSE,
                    outputPath=inputPath, outputName="Cleaned"){ ... }

Example of data cleaning

a = read.csv("travelAIData/BasicData/Base_Agent-101.csv")
ac = read.csv("travelAIData/BasicData/Cleaned_Agent-101.csv")
nrow(a)
nrow(ac)
summary(a$speed)
summary(ac$speed)

(Pseudo) anonymisation

# Brute force approach
ac[c("to_locx", "to_locy")] <- ac[c("to_locx", "to_locy")] +
  runif(n = nrow(ac) * 2, min = -0.01, max = 0.01)
args(stplanr::toptail)

Issue: international forrays

plot(ac$to_locx, ac$to_locy) # viz issues

Addition cleaning steps (prototype)

a_bounds_x = quantile(ac$from_locx, probs = c(0.1, 0.9))
a_bounds_y = quantile(ac$from_locy, probs = c(0.1, 0.9))
sel_bb = ac$to_locx > a_bounds_x[1] & ac$to_locx < a_bounds_x[2]
ac = ac[sel_bb,]
plot(ac$to_locx, ac$to_locy)

Visualisation

library(leaflet.extras)
leaflet() %>% addTiles() %>% addWebGLHeatmap(ac$to_locx, ac$to_locy, size = 10000, units = "m", alphaRange = 0.00001) 

Analysis of the data

Identification of aggregate patterns in commute behaviour

Some results I | Main mode of commute by age band

knitr::kable(readr::read_csv("vignettes/results-age-mode.csv"))

Some results II | Main mode of commute by age band

knitr::kable(readr::read_csv("vignettes/results-gender-mode.csv"))


CatchDat/TravelAIR documentation built on May 6, 2019, 9:28 a.m.