The purpose of this demo is to walk through how to use the functionality of our package. This package creates the functional ANOVA test described in Nonparametric comparisons of activity profiles from wear- able device data by Chang et. al. Our demo will walk through how to set up your data, do some exploratory analyses and perform the test.
library(tidyverse) library(ActivityProfileR)
For this demo, we will generate accelerometer data for three independent groups of subjects. Accelerometer data is simulated by taking the positive components of a Ornstein-Uhlenbeck process. These data were generated using a now-defunct package, and we simply load a saved version of the data instead.
data(example_activity)
We can take a look at an example of a subject's simulated activity data below:
example_activity %>% unnest(Xt) %>% filter(subject_id == 1, group == 1) %>% mutate( t = 1:1000 ) %>% # same as n_points in group_params above ggplot(aes(x = t, y = Xt)) + geom_line() + labs( title = "Sample of a subject's simulated activity data", x = "Time (t)", y = "Activity (steps)" )
prepare_sim_data()
uses the index in the params_by_group
list to designate a group ID column. prepare_sim_data()
uses group
to name the group ID column. Activity data is put in the Xt
column as a nested tibble. ActivityProfileR
assumes that any user-specified data has the same form as the data
created by prepare_sim_data()
.
The functional ANOVA test described in the original manuscript takes the original activity data, denoted as $X(t)$, and converts them into activity profiles, denoted $T(\alpha)$. Activity profiles are monotonically decreasing functions, where a single point represents the proportion of data points greater than a given activity value.
processed_data = example_activity %>% add_helper_columns(group_col = group) %>% add_activity_profile(group_col = group, activity_col = Xt, grid_length = 100, quantiles = c(0.05, 0.95))
The add_helper_columns()
function adds some columns that help in the calculation of the statistical test. Specifically, it adds a column n
that represents how many subjects have a specific group ID and column gamma
, which represents the proportion that a group ID is of the entire sample.
The add_activity_profile()
constructs the activity profiles for each subject. These activity profiles depend on the activity values across all groups, so this information needs to be supplied to the function. Thanks to tidyeval, you do not need to specify the column names as a string. In the simulated dataset, the group column is represented as group
and activity column as Xt
, so this should be adjusted to the input data as needed. add_activity_profile()
also adds a column ai
, which stands for activity index. Since activity values can differ across dataset, these activity indices are used as a placeholder. grid_length
and quantiles
are optional arguments that you can supply to add_activity_profile
if you wish to change the amount of activity values to use or change the quantiles kept. The first value, ai = 1
, will represent the 5th percentile of all activity values in the entire sample, while the last ai = 100
will represent the 95th percentile (per the specifications of the manuscript).
The end result is that activity profiles have been created for each subject across all groups:
head(processed_data)
We can visualize the activity profiles for each group by unnesting the data and plotting by group.
processed_data %>% unnest(c(Ta, ai)) %>% ggplot(aes(x = ai, y = Ta, group = subject_id, color = factor(group))) + geom_line(alpha = 0.3) + facet_grid(group ~ .) + theme(legend.position = "bottom") + labs( title = "Activity profiles for each subject by group", x = "Activity index alpha", y = "Proportion of data that exceeds\nactivity at index alpha" )
The functions above are used internally in the EL_test()
, which calculates the statistical test.
start = Sys.time() el = processed_data %>% EL_test(id_col = subject_id, group_col = group, activity_col = Xt) end = Sys.time() paste0("Regular version of EL_test() took ", round(end - start, 1), " minutes to run.") %>% print
The output of EL_test()
is a list of values associated with the functional ANOVA test.
sup_test
: This is the supremum of all of the $-2log(\alpha)$ values calculated for each activity index $\alpha$.sup_EL_crit
: This is the 95% quantile of the bootstrap statistics calculated on the data.out_sup_pval
: This represents the proportion of bootstrap test statistics were greater than sup_test
.neg2logRa_tests
: These are the $-2log(\alpha)$ values calculated for each activity index $\alpha$.bootstrap
: These are all of the bootstrap datasets created from the original data.# Investigate the output of the test el$sup_test el$sup_EL_crit el$out_sup_pval
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.