Introduction to lifelogr
In lifelogr: Life Logging

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

vignette("Report", package = "lifelogr")

library(lifelogr)

Lifelogging is the "process of tracking personal data generated by our own behavioral activities." Examples of lifelogging include tracking one's exercise and sleep. While devices like iPhones and Fitbits make collecting data on oneself easy, analyzing and interpreting that data is more difficult.

The lifelogr package makes life logging analysis and interpretation easy by:

Providing a framework for working with multiple sources of personal data.
Allowing users to easily create three sets of meaningful visualizations.
Creating an ecosystem for experimentation - a feedback loop between self-data, and the process generating it.

Working with Multiple Sources of Data: the "Person" class

This package is built around the Person class, an R6 class. An object of Person class encapsulates a complete picture of a user, as seen through her data. The goal of this class is to make working with multiple sources of data as easy as possible by joining and standardizing multiple datasets. Users can then access the data from and across sources for visualization and experimentation.

Upon instantiation, a user can provide information about himself, the filepath to a csv file with Apple data, Fitbit account information, and/or dataframes of data from other sources.

Loading Multi-Source Data into Person Objects

Fitbit users can provide their account username and password upon instantiation. We use functions from the fitbitScraper package to collect all of a user's Fitbit data between the start and end date of interest.

EX <- Person$new(fitbit_user_email = "example@domain.com", 
                 fitbit_user_pw = "password",
                 apple_data_file = "apple.csv",
                 addl_data = NA,
                 user_info = list("name" = "EX", "age" = 29, "gender" = "male"),
                 target_steps = 10000,
                 group_assignments = list(data.frame(NA), data.frame(NA)),
                 start_date = "2017-01-19", end_date = "2017-02-17")

load("../data/EX.rda")

iPhone users can provide a filepath to a csv with their Apple health data. Users can download their data using an iOS app called QS Access (quantified self), which allows for selection of variables and time scales (by day or by hour), to include in a csv file. For greater compatibility with the lifelogr package, users are highly recommended to select the "by hour" time interval.

Users can provide data from other sources and self-tracking apps as pre-cleaned dataframes in the addl_data or addl_data2 arguments. To make maximal use of lifelogr's functions, the column names of the same variables in the dataframe should match those referenced in the Person class, and class documentation (when the variables in the dataframe are the same as those referred to in the documentation).

Example of Person Class

EX is an example of a Person class for a user who provided their Fitbit account information upon instantiation. Functions within the Person class gathered and manipulated the data into data from three main "sources":

fitbit_daily has Fitbit data recorded on a daily basis
fitbit_intraday has Fitbit data recorded on an intraday basis, generally every 15 minutes, although for certain variables, such as heart rate, data are recorded every 5 minutes.
fitbit_util has information about the dates in the range specified by the user upon Person instantiation, such as which day of the week a specific date is.

Dates, Times, and Date-Times

Dates and times come in numerous forms. lifelogr recognizes three types of dates/times as acceptable time-related variables to join datasets on:

date: "2017-03-20"
time: "08:00:00"
datetime: "2017-03-20 08:00:00", a combination of a date and a time

Upon instantiation, the Person class automatically converts and renames Fitbit and Apple health data to adhere to this variable naming and format convention. If users provide their own addl_data as dataframes, they will need to have one or more of these time-related variables as columns in their dataframes in order to be able to join across their data source, and Fitbit/Apple health data.

Visualization

There are 3 functions which allow for a series of plots with just one function call: plot_sleep_all, plot_daily_all, and plot_intraday_all. Each acts like the plot.lm function, where users must click "enter" to see the next plot.

Each plot within each of the three series can also be generated individually with a call to its appropriate plot function.

For example, using plot_sleep_all to generate all the sleep plots for EX:

plot_sleep_all(EX)

Calling plot_daily_all and plot_intraday_all will result in similar series of plots.

Users can also call each function individually using plot_sleep(person, plot_type) (or plot_daily or plot_intraday), and also pass in other arguments for plots with other arguments.

For example:

plot_sleep(EX, "by_start_end_time", "day_of_week")

Using plot_daily:

plot_daily(EX, "steps")

Using plot_intraday: The default is for plot_intraday to group the data by time intervals within each day so that data for a "typical day" is displayed.

plot_intraday(EX, "distance", unit = "km")

However, it is also possible to specify that the plots use the raw data and plot over all datetimes.

plot_intraday(EX, "bpm", FALSE)

Experimentation Framework

While almost all the visualizations shown above had date, date-time, or time on the x axis, the experimentation framework allows for greater flexibility. Users can select any variables of interest and examine their relationship.

Users can specify

x variables of interest
y measures of interest
type of analysis ("plot", "correlation", "anova", "compare_groups", or "regression")
time variable (date, time, or datetime)

The experiment function studies the effect of each variable, or all variables together (depending on the analysis), of the variable (X) and measure variables (Y) given.

Users can also perform each analysis using the functions l_plot, correlation, l_anova, compare_groups, or l_regression instead of the higher-level function experiment.

The analysis functions below perform specific analyses, but using the standardized datasets in the fields of the Person class, the create_dataset function, which joins datasets across source types, and the templates provided by these analysis functions, users can easily perform their own unique analyses.

Plots

To plot the relationship between sleep duration and resting heart rate and the relationship between day of week and resting heart rate on a date basis: r experiment(person = EX, variables = list("fitbit_daily" = c("sleepDuration"), "util" = c("day_of_week")), measures = list("fitbit_daily" = c("restingHeartRate")), analysis = c("plot"), time_var = c("date"))
Correlation

To calculate the correlation between sleep duration and number of steps on a date basis: r experiment(person = EX, variables = list("fitbit_daily" = c("sleepDuration")), measures = list("fitbit_intraday" = c("distance")), analysis = c("correlation"), time_var = c("date")) l_plot will generate the same result: ```r dataset <- create_dataset(person = EX, all_variables = list("fitbit_daily" = c("sleepDuration"), "fitbit_intraday" = c("distance")), time_var = c("date"))
```
correlation_df <- correlation(dataset, person = EX, 
                              variables = list("fitbit_daily" = c("sleepDuration")),
                              measures = list("fitbit_intraday" = c("distance")),
                              time_var = "date")
```
```
ANOVA

To create ANOVAs for the effect of sleep duration and steps on resting heart rate on a date basis: r experiment(person = EX, variables = list("fitbit_daily" = c("sleepDuration", "steps")), measures = list("fitbit_daily" = c("restingHeartRate")), analysis = c("anova"), time_var = c("date"))
Compare Groups

To compare across time variables not already defined, such as months, users can set up groups. Groups must be part of the Person instance, so it is oftentimes easier to just use compare_groups, which instead allows users to pass in new dataframes defining groupings.

In this case, only data from January and February is in the sample instance of Person, so only two groups are compared: ```r dataset <- create_dataset(person = EX, all_variables = list("util" = c("month"), "fitbit_daily" = c("sleepDuration", "steps", "restingHeartRate")), time_var = c("date"))

indiv_months <- data.frame("month" = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), "group" = c(1:12))

compare_groups(dataset, person = EX, addl_grouping_assignments = list("indiv_months" = indiv_months), names_of_groupings = c("indiv_months"), variables_to_compare = c("steps", "restingHeartRate")) ```
Regression

To run a regression of resting heart rate on steps ^ 2 r experiment(person = EX, variables = list("fitbit_daily" = c("steps")), measures = list("fitbit_daily" = c("restingHeartRate")), analysis = c("regression"), time_var = c("date"))