library(knitr)

    #options(width=100)
    options(scipen=9999)
    opts_chunk$set(echo=TRUE)
    opts_chunk$set(message=FALSE)
    opts_chunk$set(warning=FALSE)
    opts_chunk$set(cache=FALSE)
    opts_chunk$set(collapse=TRUE)

Installing

You need the devtools package installed and then you can use:

devtools::install_github("zross/sensorDataImport")

Database and tables

This vignette assumes you have already installed PostgreSQL on a Windows machine. Once PostgreSQL and the package have been installed you will need to install the database and tables. Note that if you already have a database and table installation you will need to delete the database and re-install if the table structures have changed. Keep in mind that this will delete all data and should only be done if you're sure that this is OK.

1) If you need to delete the database and reinstall open pgAdmin III, the software to administer the database. Find the database you want to delete in the "object browser" then right click and choose Delete/Drop

2) To reinstall the database (on WINDOWS) you nead to load the sensorDataImport package with library(sensorDataImport) and then type:

create_database("columbiaBike")

3) To reinstall the tables you then type:

add_tables_db("columbiaBike")

Notes about Macs

Keep in mind that the package is in the early stages of development. The database and table creation scripts have not been adapted for Macs. But you should be able to run the code in a Mac console. For example, the code used to create the database is:

createdb -h localhost -p 5432 -U postgres columbiaBike

The code to load the tables is:

psql -p 5432 -U postgres -d columbiaBike -a -f create_tables.sql

The file create_tables.sql can be found in the sql folder in your package directory. Your package will be installed in one of the paths listed when you type:

.libPaths()

Within the folders you can find the sql folder and in that is a file called create_tables.sql.

Launch app and connect

To launch the Shiny app for uploading data you use:

library(sensorDataImport)
runShiny("nyc")

You need to be connected to the DB to upload

On the "to-do" list we will grey out the upload button if you're not connected but for now you are allowed to click on upload of you're not connected but you will get errors and will need to close and restart!

On the left you will need to make sure the parameters are correct. You will want to select the biking project. Your port is likely 5432. If you have the right parameters you should see "Connected to DB" in green at the bottom left.

Uploading

To upload you can simply use the upload button and select as many files as you want (using CTRL/CMD to select multiple) and then click OK. You will see a progress indicator. Your filenames need to follow our file naming conventions exactly.

Extracting and aggregating data

For now the extraction and aggregation is done within R (rather than a Shiny app). In order to do this you will need a valid connection to the database. The password will be the password you used when you installed Postgres. Please let me know if you have problems with this piece.

library(sensorDataImport)
get_connection("columbiaBike","PASSWORD",  host="localhost",
    port=5432, user="postgres")

The get_connection function creates an object called .connection that should look like:

src:  postgres 9.3.4 [postgres@localhost:5432/columbiaBike]
tbls: abp, gps, hxi, mae, mpm, pdr

With the valid connection you are ready to extract the data. The function is get_sensor_data and it allows you to get the raw data or aggregated data. The function arguments look like:

tablename = This is the three letter table name (e.g., 'gps')

do_aggregate = Do you want to aggregate? Default is FALSE

clean_first = Do you want to clean the data first (not really implemented). Defaults to TRUE

aggregation_unit = What time unit do you want to aggregate to? Defaults to "15 min". Options might include ("10 min", "1 day", etc). Allows you to use "complete" which means essentially, a time unit of 50000 days.

xtravars = In the RAW data, RAW data only, what extra variables do you want included. Defaults to "all" which gives you all the variables in the table

summarize_vars = What are the variables you want to summarize (average, mean etc). Defaults to NULL which is the only value allowed if you're not aggregating. 

grouping_vars = What variables should be used to aggregate by. Defaults to c("subjectid", "sessionid")

Examples

Get connection

library(sensorDataImport)
# your connection parameters will be different -- the host
# for you should probably be "localhost"
get_connection("columbiaBike","spatial",  host="localhost",
    port=5432, user="postgres")

.connection

Get info on tables

list_tables()
table_exists("blah")
table_exists("gps")
get_column_names("hxi")
get_row_count("mpm")

Grab raw data (but it will get cleaned first if there is a cleaning function):

library(sensorDataImport)
# your connection parameters will be different with, probably, port 5432
get_connection("columbiaBike","spatial",  host="localhost",
    port=5432, user="postgres")

.connection
gps_raw_clean <- get_sensor_data("gps")
head(gps_raw_clean) 

For GPS, get subject-specific centroid by averaging all records by subject/session

Note that this is NOT averaging by a time interval -- it's averaging all records by the grouping variables.

# by default aggregate is by subject, session
get_sensor_data("gps", do_aggregate=TRUE, aggregation_unit="complete",
                                   xtravars=NULL, summarize_vars=c("latitude", "longitude"))

You can aggregate at odd intervals if desired

The aggregation_unit argument accepts any number of minutes, hours or days.

micropem <- get_sensor_data("mpm",
                do_aggregate     = TRUE,
                aggregation_unit = "13 min",
                xtravars         = NULL,
                summarize_vars   = c("temperature", "flow", "neph_rhcorrect"),
                grouping_vars    = c("subjectid", "sessionid"))

head(micropem)

A complete and relevant example

Get connected

get_connection("columbiaBike","spatial",  host="localhost",
               port=5432, user="postgres")

Grab the data

# get the hex data at one minute
# this might take some time
hex <- get_sensor_data("hxi",
                            do_aggregate     = TRUE,
                            aggregation_unit = "1 min",
                            xtravars         = NULL,
                            summarize_vars   = c("breathing_rate", "heart_rate", "minute_ventilation", "cadence"),
                            grouping_vars    = c("subjectid", "sessionid"))




# careful, no compliance or HEPA correction applied
micropem <- get_sensor_data("mpm",
                            do_aggregate     = TRUE,
                            aggregation_unit = "1 min",
                            xtravars         = NULL,
                            summarize_vars   = c("temperature", "flow", "neph_rhcorrect"),
                            grouping_vars    = c("subjectid", "sessionid"))



gps<- get_sensor_data("gps",
                      do_aggregate     = TRUE,
                      aggregation_unit = "1 min",
                      xtravars         = NULL,
                      summarize_vars   = c("latitude", "longitude"),
                      grouping_vars    = c("subjectid", "sessionid"))

Add date and see if there's overlap

For the pre-pilot data the dates/times do not seem to correspond that well so let's add date to make it easier to see where they match.

library(dplyr)
hex <- mutate(hex, date = format(begin, "%Y-%m-%d"))
micropem <- mutate(micropem, date = format(begin, "%Y-%m-%d"))
gps <- mutate(gps, date = format(begin, "%Y-%m-%d"))


# take a look at unique combos

select(hex, date, subjectid, sessionid) %>% distinct
select(micropem, date, subjectid, sessionid) %>% distinct
select(gps, date, subjectid, sessionid) %>% distinct

Join the data

In theory this would work -- but in the pre-pilot testing data there is not a lot of matching data.

hex_pem <- inner_join(hex, micropem, by=c("subjectid", "sessionid",  "begin", "end"))
hex_gps <- inner_join(hex, gps, by=c("subjectid","sessionid",  "begin", "end"))
pem_gps <- inner_join(micropem, gps, by=c("subjectid", "sessionid",  "begin", "end"))

# clean up example
pem_gps <- mutate(pem_gps, date = date.y) %>% 
  select(-c(interval.x, interval.y, date.x))

A map

library(ggplot2)
library(ggmap)
qmap("George Washington Bridge, NYC", zoom = 11, color = "bw", legend = "topleft")+
  geom_point(data=pem_gps, aes(longitude_avg, latitude_avg, color=temperature_avg))+
  scale_color_gradient(low="darkkhaki", high="darkgreen")


zevross-spatial/rpackage-sensorDataImport documentation built on Aug. 16, 2019, 4:53 p.m.