library(knitr) #options(width=100) options(scipen=9999) opts_chunk$set(echo=TRUE) opts_chunk$set(message=FALSE) opts_chunk$set(warning=FALSE) opts_chunk$set(cache=FALSE) opts_chunk$set(collapse=TRUE)
You need the devtools
package installed and then you can use:
devtools::install_github("zross/sensorDataImport")
This vignette assumes you have already installed PostgreSQL on a Windows machine. Once PostgreSQL and the package have been installed you will need to install the database and tables. Note that if you already have a database and table installation you will need to delete the database and re-install if the table structures have changed. Keep in mind that this will delete all data and should only be done if you're sure that this is OK.
1) If you need to delete the database and reinstall open pgAdmin III, the software to administer the database. Find the database you want to delete in the "object browser" then right click and choose Delete/Drop
2) To reinstall the database (on WINDOWS) you nead to load the sensorDataImport package with library(sensorDataImport)
and then type:
create_database("columbiaBike")
3) To reinstall the tables you then type:
add_tables_db("columbiaBike")
Keep in mind that the package is in the early stages of development. The database and table creation scripts have not been adapted for Macs. But you should be able to run the code in a Mac console. For example, the code used to create the database is:
createdb -h localhost -p 5432 -U postgres columbiaBike
The code to load the tables is:
psql -p 5432 -U postgres -d columbiaBike -a -f create_tables.sql
The file create_tables.sql
can be found in the sql folder in your package directory. Your package will be installed in one of the paths listed when you type:
.libPaths()
Within the folders you can find the sql folder and in that is a file called create_tables.sql
.
To launch the Shiny app for uploading data you use:
library(sensorDataImport) runShiny("nyc")
You need to be connected to the DB to upload
On the "to-do" list we will grey out the upload button if you're not connected but for now you are allowed to click on upload of you're not connected but you will get errors and will need to close and restart!
On the left you will need to make sure the parameters are correct. You will want to select the biking project. Your port is likely 5432. If you have the right parameters you should see "Connected to DB" in green at the bottom left.
To upload you can simply use the upload button and select as many files as you want (using CTRL/CMD to select multiple) and then click OK. You will see a progress indicator. Your filenames need to follow our file naming conventions exactly.
For now the extraction and aggregation is done within R (rather than a Shiny app). In order to do this you will need a valid connection to the database. The password will be the password you used when you installed Postgres. Please let me know if you have problems with this piece.
library(sensorDataImport) get_connection("columbiaBike","PASSWORD", host="localhost", port=5432, user="postgres")
The get_connection
function creates an object called .connection
that should look like:
src: postgres 9.3.4 [postgres@localhost:5432/columbiaBike] tbls: abp, gps, hxi, mae, mpm, pdr
With the valid connection you are ready to extract the data. The function is get_sensor_data
and it allows you to get the raw data or aggregated data. The function arguments look like:
tablename = This is the three letter table name (e.g., 'gps') do_aggregate = Do you want to aggregate? Default is FALSE clean_first = Do you want to clean the data first (not really implemented). Defaults to TRUE aggregation_unit = What time unit do you want to aggregate to? Defaults to "15 min". Options might include ("10 min", "1 day", etc). Allows you to use "complete" which means essentially, a time unit of 50000 days. xtravars = In the RAW data, RAW data only, what extra variables do you want included. Defaults to "all" which gives you all the variables in the table summarize_vars = What are the variables you want to summarize (average, mean etc). Defaults to NULL which is the only value allowed if you're not aggregating. grouping_vars = What variables should be used to aggregate by. Defaults to c("subjectid", "sessionid")
library(sensorDataImport) # your connection parameters will be different -- the host # for you should probably be "localhost" get_connection("columbiaBike","spatial", host="localhost", port=5432, user="postgres") .connection
list_tables() table_exists("blah") table_exists("gps") get_column_names("hxi") get_row_count("mpm")
library(sensorDataImport) # your connection parameters will be different with, probably, port 5432 get_connection("columbiaBike","spatial", host="localhost", port=5432, user="postgres") .connection gps_raw_clean <- get_sensor_data("gps") head(gps_raw_clean)
Note that this is NOT averaging by a time interval -- it's averaging all records by the grouping variables.
# by default aggregate is by subject, session get_sensor_data("gps", do_aggregate=TRUE, aggregation_unit="complete", xtravars=NULL, summarize_vars=c("latitude", "longitude"))
The aggregation_unit argument accepts any number of minutes, hours or days.
micropem <- get_sensor_data("mpm", do_aggregate = TRUE, aggregation_unit = "13 min", xtravars = NULL, summarize_vars = c("temperature", "flow", "neph_rhcorrect"), grouping_vars = c("subjectid", "sessionid")) head(micropem)
get_connection("columbiaBike","spatial", host="localhost", port=5432, user="postgres")
# get the hex data at one minute # this might take some time hex <- get_sensor_data("hxi", do_aggregate = TRUE, aggregation_unit = "1 min", xtravars = NULL, summarize_vars = c("breathing_rate", "heart_rate", "minute_ventilation", "cadence"), grouping_vars = c("subjectid", "sessionid")) # careful, no compliance or HEPA correction applied micropem <- get_sensor_data("mpm", do_aggregate = TRUE, aggregation_unit = "1 min", xtravars = NULL, summarize_vars = c("temperature", "flow", "neph_rhcorrect"), grouping_vars = c("subjectid", "sessionid")) gps<- get_sensor_data("gps", do_aggregate = TRUE, aggregation_unit = "1 min", xtravars = NULL, summarize_vars = c("latitude", "longitude"), grouping_vars = c("subjectid", "sessionid"))
For the pre-pilot data the dates/times do not seem to correspond that well so let's add date to make it easier to see where they match.
library(dplyr) hex <- mutate(hex, date = format(begin, "%Y-%m-%d")) micropem <- mutate(micropem, date = format(begin, "%Y-%m-%d")) gps <- mutate(gps, date = format(begin, "%Y-%m-%d")) # take a look at unique combos select(hex, date, subjectid, sessionid) %>% distinct select(micropem, date, subjectid, sessionid) %>% distinct select(gps, date, subjectid, sessionid) %>% distinct
In theory this would work -- but in the pre-pilot testing data there is not a lot of matching data.
hex_pem <- inner_join(hex, micropem, by=c("subjectid", "sessionid", "begin", "end")) hex_gps <- inner_join(hex, gps, by=c("subjectid","sessionid", "begin", "end")) pem_gps <- inner_join(micropem, gps, by=c("subjectid", "sessionid", "begin", "end")) # clean up example pem_gps <- mutate(pem_gps, date = date.y) %>% select(-c(interval.x, interval.y, date.x))
library(ggplot2) library(ggmap) qmap("George Washington Bridge, NYC", zoom = 11, color = "bw", legend = "topleft")+ geom_point(data=pem_gps, aes(longitude_avg, latitude_avg, color=temperature_avg))+ scale_color_gradient(low="darkkhaki", high="darkgreen")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.