Get started
In mpathsenser: Process and Analyse Data from m-Path Sense

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

First, make sure we load some useful libraries (and of course mpathsenser itself).

library(tidyr)
library(dplyr)
library(ggplot2)
library(mpathsenser)

Importing files

The data for this vignette is contained in the extdata folder. However, on some system this folder may be set to read-only and it is generally good practice not to modify package folders (to prevent changing or breaking the package). To this end, we first copy the data to a temporary directory (as defined by the environment variable \code{TMPDIR}, \code{TMP}, or \code{TEMP}), a directory that is freshly created each time at R's start up and cleaned up when the session ends.

# Get the temp folder
tempdir <- tempdir()
tempdir <- file.path(tempdir, "vignette")
dir.create(tempdir)

# Get a handle to the data files
path <- system.file("extdata", "example", package = "mpathsenser")

# Get a list of all the files that are to be copied
copy_list <- list.files(path, "carp-data", full.names = TRUE)

# Copy all data
file.copy(
  from = copy_list,
  to = tempdir,
  overwrite = TRUE,
  copy.mode = FALSE
)

The extdata folder contains several .zip files as well as some JSON files. It is likely that the data for your study will look the same only much more. Note that all of these data files came directly from m-Path Sense (i.e. there was no pre-processing yet).

The data from m-Path Sense originates in the following way: The application continuously collects all kinds of data in the background (e.g. accelerometer data). Once collected, the data goes through several stages where, for example, the data is pre-processed (as already happens with data from the light sensor) or anonymised upon request. Finally, data is written to a JSON file which is really just a text file but with a specific format. When some new data comes in (whether it be from the same sensor or not), the next line is written in the JSON file and so on, until the file has reached a certain size (5MB by default). The JSON file is then zipped to reduce its size and subsequently transferred to a server. Once transferred, the data is deleted from the participant's phone to both save on space as well as prevent data leakage.

Thus, a first step to take is to unzip these files to extract its JSON contents. If you feel more comfortable unzipping using your favourite zip program you can do so, just make sure all files end up in the same directory (including the non-zipped JSON files).

unzip_data(path = tempdir)

In m-Path Sense, data is written to JSON files as it comes in. In the JSON file format, every file starts with [ and ends with ]. If the app is killed, JSON files are not properly closed and hence cannot be read by JSON parsers. So, we must first test if all files are in a valid JSON format and fix those that are not.

You may first run fix_jsons() to fix all files in the directory that need fixing, or you can first run test_jsons() to get an estimate of how many file need fixing. Running fix_jsons() also runs test_jsons() implicitly to only fix files that really need fixing.

# Note that test_jsons returns the full path names
to_fix <- test_jsons(tempdir)
print(to_fix)

fix_jsons(path = NULL, to_fix)

Next, we can create an mpathsenser database using the create_db() function. This function takes as a first argument a pathname where the database will be created and as a second argument a db_name to name the database file. You may also leave path set to NULL and specify a full path name (including file name) in the db_name argument. Always remember to save the output to a variable so you can use it later and to avoid having an open handle but unused handle on the database connection.

# Create a new database
db <- create_db(tempdir, "getstarted.db")

Finally, we can import the data into the newly created database using the import() function. This function takes as a first argument the path to the directory where the data is located and as a second argument the database handle. You may also specify a batch_size to control how many files are imported at once. Files are processed sequentially (though parallellism is supported via the future package) but files are written in batch to the database for the sake of efficiency. A greater batch size thus denotes larger batches of files being written to the database at once (and is more efficient), but risk failing the larger batch a single file fails to be processed.

# Import the data
import(
  path = tempdir,
  db = db,
  sensors = NULL, # All sensors
  batch_size = 12,
  recursive = TRUE
)

Creating a coverage chart

To get a better overview of the data, we may create a coverage chart that shows the average number of collected samples per hour for each sensor for a single participant. An important distinction must be made between a relative and absolute coverage chart. The relative coverage chart shows the average number of samples per hour for each sensor as a percentage of the expected number of samples per hour for that sensor. The absolute coverage chart shows the average number of samples per hour for each sensor. The relative coverage chart is useful to see how well the data is being collected for each sensor, while the absolute coverage chart is useful to see how well the data is being collected in general.

sensors <- c(
  "Accelerometer", "Activity", "AppUsage", "Bluetooth", "Calendar",
  "Connectivity", "Device", "Gyroscope", "InstalledApps", "Light",
  "Location", "Memory", "Pedometer", "Screen", "Weather", "Wifi"
)
cov <- coverage(
  db = db,
  participant_id = "2784",
  sensor = sensors,
  relative = FALSE
)

print(cov)