Purpose

The purpose of this demo is to walk through how to use the functionality of our package. This package creates the functional ANOVA test described in Nonparametric comparisons of activity profiles from wear- able device data by Chang et. al. Our demo will walk through how to set up your data, do some exploratory analyses and perform the test.

Set Up

library(tidyverse)
library(ActivityProfileR)

Simulating Data

For this demo, we will generate accelerometer data for three independent groups of subjects. Accelerometer data is simulated by taking the positive components of a Ornstein-Uhlenbeck process. These data were generated using a now-defunct package, and we simply load a saved version of the data instead.

data(example_activity)

We can take a look at an example of a subject's simulated activity data below:

example_activity %>% 
  unnest(Xt) %>% 
  filter(subject_id == 1, group == 1) %>% 
  mutate( t = 1:1000 ) %>% # same as n_points in group_params above
  ggplot(aes(x = t, y = Xt)) +
  geom_line() +
  labs(
    title = "Sample of a subject's simulated activity data",
    x = "Time (t)",
    y = "Activity (steps)"
  )

prepare_sim_data() uses the index in the params_by_group list to designate a group ID column. prepare_sim_data() uses group to name the group ID column. Activity data is put in the Xt column as a nested tibble. ActivityProfileR assumes that any user-specified data has the same form as the data created by prepare_sim_data().

Producing The Activity Profle $T(\alpha)$

The functional ANOVA test described in the original manuscript takes the original activity data, denoted as $X(t)$, and converts them into activity profiles, denoted $T(\alpha)$. Activity profiles are monotonically decreasing functions, where a single point represents the proportion of data points greater than a given activity value.

processed_data = example_activity %>% 
  add_helper_columns(group_col = group) %>% 
  add_activity_profile(group_col = group, 
                       activity_col = Xt,
                       grid_length = 100, 
                       quantiles = c(0.05, 0.95))

The add_helper_columns() function adds some columns that help in the calculation of the statistical test. Specifically, it adds a column n that represents how many subjects have a specific group ID and column gamma, which represents the proportion that a group ID is of the entire sample.

The add_activity_profile() constructs the activity profiles for each subject. These activity profiles depend on the activity values across all groups, so this information needs to be supplied to the function. Thanks to tidyeval, you do not need to specify the column names as a string. In the simulated dataset, the group column is represented as group and activity column as Xt, so this should be adjusted to the input data as needed. add_activity_profile() also adds a column ai, which stands for activity index. Since activity values can differ across dataset, these activity indices are used as a placeholder. grid_length and quantiles are optional arguments that you can supply to add_activity_profile if you wish to change the amount of activity values to use or change the quantiles kept. The first value, ai = 1, will represent the 5th percentile of all activity values in the entire sample, while the last ai = 100 will represent the 95th percentile (per the specifications of the manuscript).

The end result is that activity profiles have been created for each subject across all groups:

head(processed_data)

Visualizing $T(\alpha)$

We can visualize the activity profiles for each group by unnesting the data and plotting by group.

processed_data %>% 
  unnest(c(Ta, ai)) %>% 
  ggplot(aes(x = ai, y = Ta, group = subject_id, color = factor(group))) + 
  geom_line(alpha = 0.3) + 
  facet_grid(group ~ .) +
  theme(legend.position = "bottom") +
  labs(
    title = "Activity profiles for each subject by group",
    x = "Activity index alpha",
    y = "Proportion of data that exceeds\nactivity at index alpha"
  )

Calculating Functional ANOVA Test

The functions above are used internally in the EL_test(), which calculates the statistical test.

start = Sys.time()
el = processed_data %>% 
  EL_test(id_col = subject_id,
          group_col = group, 
          activity_col = Xt)
end = Sys.time()
paste0("Regular version of EL_test() took ", 
       round(end - start, 1), 
       " minutes to run.") %>% print

The output of EL_test() is a list of values associated with the functional ANOVA test.

# Investigate the output of the test
el$sup_test
el$sup_EL_crit
el$out_sup_pval


thecbp/ActivityProfileR documentation built on April 10, 2021, 6:44 p.m.