knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The TwoRegression package allows users to quickly and accurately develop/apply two-regression algorithms to data from research-grade wearable devices. This vignette is designed to demonstrate usage of the package's core features.
Before getting into that, it's valuable to cover some history. The package was
initially established as a home for the models of Hibbing et al.
(2018). Since the initial
release, support has been added for developing/applying new models, as well as
applying others from prior research (see Crouter et
al.(2006), Crouter et
al.(2010), and Crouter et
al.(2012)). As of
version 1.0.0, a new approach has been implemented for invoking prior methods
via the TwoRegression
function, which we will look at in the following
section. Afterwards, we will cover the process of creating and cross-validating
new models, plus other aspects of using them effectively.
Prior models are implemented using the TwoRegression
function. Currently,
support is available for the following:
It's very important that you look at the TwoRegression
function documentation.
It will help you understand what settings you need to provide in order to run a
specific model correctly. To view the documentation, run the following:
?TwoRegression::TwoRegression
This will pull up a documentation page where you can see the syntax for calling
the TwoRegression
function. Importantly, the page also lists the syntax for
several internal applicators (i.e., crouter_2006
, crouter_2010
,
crouter_2012
, and hibbing_2018
), which are the functions that actually do
the work of applying your selected model. That is, the TwoRegression
function
is just a wrapper around those other internal functions, and based on the method
you select, TwoRegression
will call out to the corresponding applicator. In
most cases, you will need to designate some extra settings for the applicator,
which is why the syntax is listed in the documentation file alongside the
TwoRegression
syntax. Arguments for the internal functions can be passed into
the TwoRegression
function directly, as if they were arguments to that
function itself. This will be easier to see and understand in the coding samples
later on in this section, but it's important to be aware of all this from the
get-go.
The TwoRegression
function operates under the assumption you already have data
read into R. You can do this with the AGread
package.
if (!"remotes" %in% installed.packages) install.packages("remotes") if (!"AGread" %in% installed.packages()) remotes::install_github("paulhibbing/AGread")
For the sake of this illustration, the TwoRegression package provides some sample data we can use. If you can get your own data into a form that mirrors the sample data below, you'll be in good shape. Here's how to access it:
data(count_data, package = "TwoRegression") data(all_data, package = "TwoRegression")
The count_data
object contains activity count data (for the Crouter
two-regression models), while the all_data
object contains raw sensor data
(for the Hibbing models). We can view the first few rows of count data as
follows:
utils::head(count_data)
We can do the same using a similar approach for the raw data. However, let's remove some extraneous variables first.
all_data <- all_data[ ,setdiff( names(all_data), ## These are the variables to remove: c( "file_source_PrimaryAccel", "date_processed_PrimaryAccel", "file_source_IMU", "date_processed_IMU", "day_of_year", "minute_of_day", ## Remove the following because they'll be recalculated later "ENMO_CV10s", "GVM_CV10s", "Direction" ) )] utils::head(all_data)
Once you have your dataset ready, it's easy to apply a two-regression model.
Just invoke the TwoRegression
function like this:
crouter2006_results <- TwoRegression::TwoRegression( count_data, "Crouter 2006", movement_var = "Axis1", time_var = "time" ) crouter2010_results <- TwoRegression::TwoRegression( count_data, "Crouter 2010", movement_var = "Axis1", time_var = "time" ) crouter2012_va_results <- TwoRegression::TwoRegression( count_data, "Crouter 2012", movement_var = "Axis1", time_var = "time", model = "VA", check = FALSE ) crouter2012_vm_results <- TwoRegression::TwoRegression( count_data, "Crouter 2012", movement_var = "Vector.Magnitude", time_var = "time", model = "VM", check = FALSE )
For the Crouter 2012 models, you have to choose between the vertical axis model
and the vector magnitude model. If you don't set check = FALSE
, you will get a
warning about which movement variable and model you've selected. This is meant
as a prompt for you to ensure your selected movement variable matches your
selected model. Once you're confident in your selection, you can set check =
FALSE
and the warning won't show up.
For the time being, you can only implement Crouter models one at a time. Of course, you can combine the output from multiple models yourself. Ideally, with ongoing development, a point will come where this can be done automatically and efficiently (see the GitHub issue on this topic), but for now it isn't built in. As we'll see in the following subsection, though, it is doable for the Hibbing models. Here's a look at the output from the prior commands:
utils::head(crouter2006_results) utils::head(crouter2010_results) utils::head(crouter2012_va_results) utils::head(crouter2012_vm_results)
The Hibbing models are implemented similarly to the Crouter models. A key difference, though, is that you can ask the function to run multiple models simultaneously. That's what we'll see in the following example:
hibbing2018_results <- TwoRegression::TwoRegression( all_data, "Hibbing 2018", accel_var = "ENMO", gyro_var = "Gyroscope_VM_DegPerS", direction_var = "mean_magnetometer_direction", ## Here is where we can select an algorithm from multiple sites: site = c("Left Ankle", "Right Ankle"), ## And here is where we can select multiple algorithms ## (1 = accelerometer only; 2 = accelerometer and gyroscope; ## 3 = accelerometer, gyroscope, and magnetometer) algorithm = 1:2, ## We can also ask the function to collapse data every minute by making an ## extra call to `smooth_2rm` smooth = TRUE ) utils::head(hibbing2018_results)
So, each algorithm is run, and the information is stored in a unique and descriptive variable name.
The TwoRegression package is also useful if you want to create your own model.
To get this going, though, your dataset needs to have some more complex
information in it. We'll use our previous all_data
object in this
illustration. First, we need to label it with some pretend activity labels and
energy expenditure values (METs). In a real-life setting, the MET values would
likely come from indirect calorimetry. To create some of this imaginary data, we
can run the following:
set.seed(307) fake_sed <- c("Lying", "Sitting") fake_lpa <- c("Sweeping", "Dusting") fake_cwr <- c("Walking", "Running") fake_ila <- c("Tennis", "Basketball") fake_activities <- c(fake_sed, fake_lpa, fake_cwr, fake_ila) all_data$Activity <- sample(fake_activities, nrow(all_data), TRUE) all_data$fake_METs <- ifelse( all_data$Activity %in% c(fake_sed, fake_lpa), runif(nrow(all_data), 1, 2), runif(nrow(all_data), 2.5, 8) )
For this demonstration, a couple of extra hacks are needed, which would be much more natural to handle with real data. Still, they're helpful to see. First, we need to make sure our dataset has a column indicating which participant each data point came from. In this case, we'll just label our data to pretend it came from two sample files instead of one (where 'sample file' is analogous to 'participant'). The other step is calculating the coefficient of variation (CV). We technically could have avoided this by choosing not to delete the CV variables earlier. But that decision now gives us an excuse to show how convenient it is to calculate CV in the TwoRegression package.There were also some technical reasons for deleting the variables earlier, but nevermind that (see another GitHub issue if you're curious).
all_data$PID <- rep( c("Test1", "Test2"), each = ceiling(nrow(all_data) / 2) )[seq(nrow(all_data))] all_data$ENMO_CV10s <- TwoRegression::cv_2rm(all_data$ENMO)
When we go to fit the model, we'll use the fit_2rm
function. There are a lot
of arguments to provide here:
activity_var
that should
be included when calibrating the 2RM sedentary cut pointactivity_var
that should be
labeled as positive for sedentary behavior when calibrating the 2RM sedentary
cut pointsed_cp_var
falls below the 2RM sedentary cut point)activity_var
that should
be labeled as positive for "continuous walking/running" (CWR) when
calibrating the 2RM CWR cut pointoutcome ~ predictors
)walkrun_formula
-- note that data transformations like
squaring or cubing should be wrapped in I()
)From there, we can fit our model like this:
my_model <- TwoRegression::fit_2rm( data = all_data, activity_var = "Activity", sed_cp_activities = c(fake_sed, fake_lpa), sed_activities = fake_sed, sed_cp_var = "ENMO", sed_METs = 1.25, walkrun_activities = fake_cwr, walkrun_cp_var = "ENMO_CV10s", met_var = "fake_METs", walkrun_formula = "fake_METs ~ ENMO", intermittent_formula = "fake_METs ~ ENMO + I(ENMO^2) + I(ENMO^3)" )
The package provides summary and plot methods to understand, cross-validate, and visualize the model. Notably, this demonstration model is not meant to perform well or look pretty (the data are just numbers that have no real meaning), but we'll still take a look at how to run the code.
As far as the summary method goes, this is where we need the participant identification column we set up earlier. Specifically, it will be used for leave-one-out cross-validation, where the data are split up into different chunks while the model is repeatedly re-fitted. Other information in the output includes a textual representation of the overall algorithm and summaries of the fit/performance of individual components (i.e., ROC and regression analyses). To pull all of this up, you just have to run code that matches the following pattern:
summary( my_model, subject_var = "PID", MET_var = "fake_METs", activity_var = "Activity" )
For the plot function, you'll need to fill in some of the same values from the
original call to fit_2rm
. Use code that matches the following pattern:
## You have to explicitly type `object = ` for this to work plot( object = my_model, sed_cp_activities = c(fake_sed, fake_lpa), sed_activities = fake_sed, sed_cpVar = "ENMO", activity_var = "Activity", met_var = "fake_METs", walkrun_activities = fake_cwr, walkrun_cpVar = "ENMO_CV10s", print = TRUE )
Once you've created your model, you want to use it on new data. That's easy to
do using the predict
method included in the package. If we pretend our
all_data
object is a new dataset, we could get predictions by running code
like this:
new_results <- predict(my_model, all_data) utils::head(new_results)
When making predictions, you can specify verbose = TRUE
if you want to print a
message to the console about making predictions from your model. By default, it
will say it's making predictions using the 'user_unspecified' model. To give
your model a name, you can assign a value to its method
element. Consider the following:
results_default <- predict(my_model, all_data, verbose = TRUE) my_model$method <- "My Customized 2RM" results_updated <- predict(my_model, all_data, verbose = TRUE)
And, of course, you can collapse the estimates to a particular time granularity
using smooth_2rm
like this:
## This code illustrates collapsing every 60 seconds. (This is the default ## period and also the typical recommendation, but you could do anything, ## e.g., "10 sec", "30 sec", or "0.25 hour") TwoRegression::smooth_2rm(results_updated, "Timestamp", "60 sec")
That's it. This has been a quick crash course in the core features and functions of the TwoRegression package. If you have questions or feedback, feel free to connect by posting an issue on the TwoRegression GitHub page. Happy coding!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.