The TwoRegression package allows users to quickly and accurately develop/apply two-regression algorithms to data from research-grade wearable devices. This vignette is designed to demonstrate usage of the package’s core features.
Before getting into that, it’s valuable to cover some history. The
package was initially established as a home for the models of Hibbing et al.
(2018). Since the initial release, support has been added for
developing/applying new models, as well as applying others from prior
research (see Crouter et
al.(2006), Crouter et
al.(2010), and Crouter et
al.(2012)). As of version 1.0.0, a new approach has been implemented
for invoking prior methods via the TwoRegression
function,
which we will look at in the following section. Afterwards, we will
cover the process of creating and cross-validating new models, plus
other aspects of using them effectively.
Prior models are implemented using the TwoRegression
function. Currently, support is available for the following:
It’s very important that you look at the TwoRegression
function documentation. It will help you understand what settings you
need to provide in order to run a specific model correctly. To view the
documentation, run the following:
::TwoRegression ?TwoRegression
This will pull up a documentation page where you can see the syntax
for calling the TwoRegression
function. Importantly, the
page also lists the syntax for several internal applicators (i.e.,
crouter_2006
, crouter_2010
,
crouter_2012
, and hibbing_2018
), which are the
functions that actually do the work of applying your selected model.
That is, the TwoRegression
function is just a wrapper
around those other internal functions, and based on the method you
select, TwoRegression
will call out to the corresponding
applicator. In most cases, you will need to designate some extra
settings for the applicator, which is why the syntax is listed in the
documentation file alongside the TwoRegression
syntax.
Arguments for the internal functions can be passed into the
TwoRegression
function directly, as if they were arguments
to that function itself. This will be easier to see and understand in
the coding samples later on in this section, but it’s important to be
aware of all this from the get-go.
The TwoRegression
function operates under the assumption
you already have data read into R. You can do this with the
AGread
package.
if (!"remotes" %in% installed.packages)
install.packages("remotes")
if (!"AGread" %in% installed.packages())
::install_github("paulhibbing/AGread") remotes
For the sake of this illustration, the TwoRegression package provides some sample data we can use. If you can get your own data into a form that mirrors the sample data below, you’ll be in good shape. Here’s how to access it:
data(count_data, package = "TwoRegression")
data(all_data, package = "TwoRegression")
The count_data
object contains activity count data (for
the Crouter two-regression models), while the all_data
object contains raw sensor data (for the Hibbing models). We can view
the first few rows of count data as follows:
::head(count_data)
utils#> time Axis1 Axis2 Axis3 Vector.Magnitude
#> 1 2015-11-12 15:45:00 71 284 287 410
#> 2 2015-11-12 15:45:10 0 0 0 0
#> 3 2015-11-12 15:45:20 0 0 0 0
#> 4 2015-11-12 15:45:30 0 0 0 0
#> 5 2015-11-12 15:45:40 0 0 0 0
#> 6 2015-11-12 15:45:50 0 0 0 0
We can do the same using a similar approach for the raw data. However, let’s remove some extraneous variables first.
<- all_data[ ,setdiff(
all_data names(all_data),
## These are the variables to remove:
c(
"file_source_PrimaryAccel", "date_processed_PrimaryAccel",
"file_source_IMU", "date_processed_IMU", "day_of_year", "minute_of_day",
## Remove the following because they'll be recalculated later
"ENMO_CV10s", "GVM_CV10s", "Direction"
)
)]
::head(all_data)
utils#> PID Timestamp ENMO Gyroscope_VM_DegPerS
#> 1 Test 2015-11-12 15:45:00 91.81064 0.6985270
#> 2 Test 2015-11-12 15:45:01 63.90690 0.6976330
#> 3 Test 2015-11-12 15:45:02 63.70351 0.6806886
#> 4 Test 2015-11-12 15:45:03 64.24208 0.6806373
#> 5 Test 2015-11-12 15:45:04 63.70059 0.6926852
#> 6 Test 2015-11-12 15:45:05 63.58346 0.6989126
#> mean_abs_Gyroscope_x_DegPerS mean_abs_Gyroscope_y_DegPerS
#> 1 0.2207365 0.5846348
#> 2 0.2344190 0.5713514
#> 3 0.2176725 0.5628797
#> 4 0.2382042 0.5606505
#> 5 0.2274410 0.5743000
#> 6 0.2290555 0.5746244
#> mean_abs_Gyroscope_z_DegPerS mean_magnetometer_direction
#> 1 0.3059072 S
#> 2 0.3203365 S
#> 3 0.3101393 S
#> 4 0.3001943 S
#> 5 0.3083593 S
#> 6 0.3191014 S
Once you have your dataset ready, it’s easy to apply a two-regression
model. Just invoke the TwoRegression
function like
this:
<- TwoRegression::TwoRegression(
crouter2006_results "Crouter 2006", movement_var = "Axis1", time_var = "time"
count_data,
)
<- TwoRegression::TwoRegression(
crouter2010_results "Crouter 2010", movement_var = "Axis1", time_var = "time"
count_data,
)
<- TwoRegression::TwoRegression(
crouter2012_va_results "Crouter 2012", movement_var = "Axis1",
count_data, time_var = "time", model = "VA", check = FALSE
)
<- TwoRegression::TwoRegression(
crouter2012_vm_results "Crouter 2012", movement_var = "Vector.Magnitude",
count_data, time_var = "time", model = "VM", check = FALSE
)
For the Crouter 2012 models, you have to choose between the vertical
axis model and the vector magnitude model. If you don’t set
check = FALSE
, you will get a warning about which movement
variable and model you’ve selected. This is meant as a prompt for you to
ensure your selected movement variable matches your selected model. Once
you’re confident in your selection, you can set
check = FALSE
and the warning won’t show up.
For the time being, you can only implement Crouter models one at a time. Of course, you can combine the output from multiple models yourself. Ideally, with ongoing development, a point will come where this can be done automatically and efficiently (see the GitHub issue on this topic), but for now it isn’t built in. As we’ll see in the following subsection, though, it is doable for the Hibbing models. Here’s a look at the output from the prior commands:
::head(crouter2006_results)
utils#> time Axis1 Axis2 Axis3 Vector.Magnitude cv_10 Classification
#> 1 2015-11-12 15:45:00 71 284 287 410 244.949 intermittent
#> 2 2015-11-12 15:46:00 0 0 0 0 0.000 SB
#> 3 2015-11-12 15:47:00 0 0 0 0 0.000 SB
#> 4 2015-11-12 15:48:00 0 0 0 0 0.000 SB
#> 5 2015-11-12 15:49:00 0 0 0 0 0.000 SB
#> METs
#> 1 2.446791
#> 2 1.000000
#> 3 1.000000
#> 4 1.000000
#> 5 1.000000
::head(crouter2010_results)
utils#> time Axis1 Axis2 Axis3 Vector.Magnitude mean_cv_10 SB_epochs
#> 1 2015-11-12 15:45:00 71 284 287 410 40.82483 5
#> 2 2015-11-12 15:46:00 0 0 0 0 0.00000 6
#> 3 2015-11-12 15:47:00 0 0 0 0 0.00000 6
#> 4 2015-11-12 15:48:00 0 0 0 0 0.00000 6
#> 5 2015-11-12 15:49:00 0 0 0 0 0.00000 6
#> walkrun_epochs intermittent_epochs METs
#> 1 0 1 1.28234
#> 2 0 0 1.00000
#> 3 0 0 1.00000
#> 4 0 0 1.00000
#> 5 0 0 1.00000
::head(crouter2012_va_results)
utils#> time Axis1 Axis2 Axis3 Vector.Magnitude mean_cv_10 SB_epochs
#> 1 2015-11-12 15:45:00 71 284 287 410 40.82483 5
#> 2 2015-11-12 15:46:00 0 0 0 0 0.00000 6
#> 3 2015-11-12 15:47:00 0 0 0 0 0.00000 6
#> 4 2015-11-12 15:48:00 0 0 0 0 0.00000 6
#> 5 2015-11-12 15:49:00 0 0 0 0 0.00000 6
#> walkrun_epochs intermittent_epochs METs
#> 1 0 1 1.34108
#> 2 0 0 1.00000
#> 3 0 0 1.00000
#> 4 0 0 1.00000
#> 5 0 0 1.00000
::head(crouter2012_vm_results)
utils#> time Axis1 Axis2 Axis3 Vector.Magnitude mean_cv_10 SB_epochs
#> 1 2015-11-12 15:45:00 71 284 287 410 40.82483 5
#> 2 2015-11-12 15:46:00 0 0 0 0 0.00000 6
#> 3 2015-11-12 15:47:00 0 0 0 0 0.00000 6
#> 4 2015-11-12 15:48:00 0 0 0 0 0.00000 6
#> 5 2015-11-12 15:49:00 0 0 0 0 0.00000 6
#> walkrun_epochs intermittent_epochs METs
#> 1 0 1 1.368691
#> 2 0 0 1.000000
#> 3 0 0 1.000000
#> 4 0 0 1.000000
#> 5 0 0 1.000000
The Hibbing models are implemented similarly to the Crouter models. A key difference, though, is that you can ask the function to run multiple models simultaneously. That’s what we’ll see in the following example:
<- TwoRegression::TwoRegression(
hibbing2018_results
"Hibbing 2018", accel_var = "ENMO",
all_data, gyro_var = "Gyroscope_VM_DegPerS",
direction_var = "mean_magnetometer_direction",
## Here is where we can select an algorithm from multiple sites:
site = c("Left Ankle", "Right Ankle"),
## And here is where we can select multiple algorithms
## (1 = accelerometer only; 2 = accelerometer and gyroscope;
## 3 = accelerometer, gyroscope, and magnetometer)
algorithm = 1:2,
## We can also ask the function to collapse data every minute by making an
## extra call to `smooth_2rm`
smooth = TRUE
)
::head(hibbing2018_results)
utils#> Timestamp PID mean_magnetometer_direction ENMO
#> 1 2015-11-12 15:45:00 Test S 3955.371
#> 2 2015-11-12 15:46:00 Test S 3971.312
#> 3 2015-11-12 15:47:00 Test S 3970.534
#> 4 2015-11-12 15:48:00 Test S 3929.218
#> 5 2015-11-12 15:49:00 Test S 3771.568
#> Gyroscope_VM_DegPerS mean_abs_Gyroscope_x_DegPerS
#> 1 41.23766 13.79911
#> 2 41.46834 13.82252
#> 3 41.77129 14.04145
#> 4 41.99182 14.09534
#> 5 41.40131 13.84211
#> mean_abs_Gyroscope_y_DegPerS mean_abs_Gyroscope_z_DegPerS ENMO_CV10s
#> 1 33.78546 18.90326 0.6910839
#> 2 33.91097 19.15887 0.4586548
#> 3 34.32867 18.90991 0.5657218
#> 4 34.66010 18.75561 0.5590702
#> 5 34.23208 18.42478 0.6726754
#> GVM_CV10s Left_Ankle_Algorithm1_SB_epochs
#> 1 0.8023038 0
#> 2 0.7353243 0
#> 3 0.7774376 0
#> 4 0.7712045 0
#> 5 0.7228010 0
#> Left_Ankle_Algorithm1_walkrun_epochs
#> 1 60
#> 2 60
#> 3 60
#> 4 60
#> 5 59
#> Left_Ankle_Algorithm1_intermittent_epochs Right_Ankle_Algorithm1_SB_epochs
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
#> 5 0 0
#> Right_Ankle_Algorithm1_walkrun_epochs
#> 1 60
#> 2 60
#> 3 60
#> 4 60
#> 5 59
#> Right_Ankle_Algorithm1_intermittent_epochs Left_Ankle_Algorithm2_SB_epochs
#> 1 0 60
#> 2 0 60
#> 3 0 60
#> 4 0 60
#> 5 0 59
#> Left_Ankle_Algorithm2_walkrun_epochs
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> Left_Ankle_Algorithm2_intermittent_epochs Right_Ankle_Algorithm2_SB_epochs
#> 1 0 60
#> 2 0 60
#> 3 0 60
#> 4 0 60
#> 5 0 59
#> Right_Ankle_Algorithm2_walkrun_epochs
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> Right_Ankle_Algorithm2_intermittent_epochs Left_Ankle_Algorithm1_METs
#> 1 0 2.732142
#> 2 0 2.733321
#> 3 0 2.733263
#> 4 0 2.730209
#> 5 0 2.723279
#> Right_Ankle_Algorithm1_METs Left_Ankle_Algorithm2_METs
#> 1 2.463901 1.25
#> 2 2.465122 1.25
#> 3 2.465063 1.25
#> 4 2.461897 1.25
#> 5 2.454716 1.25
#> Right_Ankle_Algorithm2_METs
#> 1 1.25
#> 2 1.25
#> 3 1.25
#> 4 1.25
#> 5 1.25
So, each algorithm is run, and the information is stored in a unique and descriptive variable name.
The TwoRegression package is also useful if you want to create your
own model. To get this going, though, your dataset needs to have some
more complex information in it. We’ll use our previous
all_data
object in this illustration. First, we need to
label it with some pretend activity labels and energy expenditure values
(METs). In a real-life setting, the MET values would likely come from
indirect calorimetry. To create some of this imaginary data, we can run
the following:
set.seed(307)
<- c("Lying", "Sitting")
fake_sed <- c("Sweeping", "Dusting")
fake_lpa <- c("Walking", "Running")
fake_cwr <- c("Tennis", "Basketball")
fake_ila
<- c(fake_sed, fake_lpa, fake_cwr, fake_ila)
fake_activities
$Activity <- sample(fake_activities, nrow(all_data), TRUE)
all_data
$fake_METs <- ifelse(
all_data$Activity %in% c(fake_sed, fake_lpa),
all_datarunif(nrow(all_data), 1, 2),
runif(nrow(all_data), 2.5, 8)
)
For this demonstration, a couple of extra hacks are needed, which would be much more natural to handle with real data. Still, they’re helpful to see. First, we need to make sure our dataset has a column indicating which participant each data point came from. In this case, we’ll just label our data to pretend it came from two sample files instead of one (where ‘sample file’ is analogous to ‘participant’). The other step is calculating the coefficient of variation (CV). We technically could have avoided this by choosing not to delete the CV variables earlier. But that decision now gives us an excuse to show how convenient it is to calculate CV in the TwoRegression package.There were also some technical reasons for deleting the variables earlier, but nevermind that (see another GitHub issue if you’re curious).
$PID <- rep(
all_datac("Test1", "Test2"),
each = ceiling(nrow(all_data) / 2)
seq(nrow(all_data))]
)[
$ENMO_CV10s <- TwoRegression::cv_2rm(all_data$ENMO) all_data
When we go to fit the model, we’ll use the fit_2rm
function. There are a lot of arguments to provide here:
activity_var
that should be included when calibrating the
2RM sedentary cut pointactivity_var
that should be labeled as positive for
sedentary behavior when calibrating the 2RM sedentary cut pointsed_cp_var
falls below
the 2RM sedentary cut point)activity_var
that should be labeled as positive for
“continuous walking/running” (CWR) when calibrating the 2RM CWR cut
pointoutcome ~ predictors
)walkrun_formula
– note that data transformations like squaring or cubing should be
wrapped in I()
)From there, we can fit our model like this:
<- TwoRegression::fit_2rm(
my_model data = all_data,
activity_var = "Activity",
sed_cp_activities = c(fake_sed, fake_lpa),
sed_activities = fake_sed,
sed_cp_var = "ENMO",
sed_METs = 1.25,
walkrun_activities = fake_cwr,
walkrun_cp_var = "ENMO_CV10s",
met_var = "fake_METs",
walkrun_formula = "fake_METs ~ ENMO",
intermittent_formula = "fake_METs ~ ENMO + I(ENMO^2) + I(ENMO^3)"
)
The package provides summary and plot methods to understand, cross-validate, and visualize the model. Notably, this demonstration model is not meant to perform well or look pretty (the data are just numbers that have no real meaning), but we’ll still take a look at how to run the code.
As far as the summary method goes, this is where we need the participant identification column we set up earlier. Specifically, it will be used for leave-one-out cross-validation, where the data are split up into different chunks while the model is repeatedly re-fitted. Other information in the output includes a textual representation of the overall algorithm and summaries of the fit/performance of individual components (i.e., ROC and regression analyses). To pull all of this up, you just have to run code that matches the following pattern:
summary(
my_model,subject_var = "PID",
MET_var = "fake_METs",
activity_var = "Activity"
)#> $cut_points
#> $cut_points$sedentary_ENMO
#> threshold specificity sensitivity
#> 65.7094366 0.5068493 0.5362319
#>
#> $cut_points$walk_run_ENMO
#> threshold specificity sensitivity
#> 0.4777372 0.5777778 0.5185185
#>
#>
#> $regression_models
#> $regression_models$walk_run
#> formula adj.r.squared see
#> 1 fake_METs ~ ENMO -0.01142529 2.264023
#>
#> $regression_models$intermittent_activity
#> formula adj.r.squared see
#> 1 fake_METs ~ ENMO + I(ENMO^2) + I(ENMO^3) -0.03106764 2.152466
#>
#>
#> $leave_one_out
#> RMSE MAPE
#> 1 4358.838 13675.04
#>
#> $algorithm
#> [1] "If ENMO <= 65.71: METS = 1.25"
#> [2] "If ENMO > 65.71 AND ENMO_CV10s <= 0.48: METs = 18.336 - 0.226 (ENMO)"
#> [3] "If ENMO > 65.71 AND ENMO_CV10s > 0.48: METs = 5306.279 - 216.4804 (ENMO) + 2.9170 (I(ENMO^2)) - 0.0129 (I(ENMO^3))"
For the plot function, you’ll need to fill in some of the same values
from the original call to fit_2rm
. Use code that matches
the following pattern:
## You have to explicitly type `object = ` for this to work
plot(
object = my_model,
sed_cp_activities = c(fake_sed, fake_lpa),
sed_activities = fake_sed,
sed_cpVar = "ENMO",
activity_var = "Activity",
met_var = "fake_METs",
walkrun_activities = fake_cwr,
walkrun_cpVar = "ENMO_CV10s",
print = TRUE
)
Once you’ve created your model, you want to use it on new data.
That’s easy to do using the predict
method included in the
package. If we pretend our all_data
object is a new
dataset, we could get predictions by running code like this:
<- predict(my_model, all_data)
new_results
::head(new_results)
utils#> PID Timestamp ENMO Gyroscope_VM_DegPerS
#> 1 Test1 2015-11-12 15:45:00 91.81064 0.6985270
#> 2 Test1 2015-11-12 15:45:01 63.90690 0.6976330
#> 3 Test1 2015-11-12 15:45:02 63.70351 0.6806886
#> 4 Test1 2015-11-12 15:45:03 64.24208 0.6806373
#> 5 Test1 2015-11-12 15:45:04 63.70059 0.6926852
#> 6 Test1 2015-11-12 15:45:05 63.58346 0.6989126
#> mean_abs_Gyroscope_x_DegPerS mean_abs_Gyroscope_y_DegPerS
#> 1 0.2207365 0.5846348
#> 2 0.2344190 0.5713514
#> 3 0.2176725 0.5628797
#> 4 0.2382042 0.5606505
#> 5 0.2274410 0.5743000
#> 6 0.2290555 0.5746244
#> mean_abs_Gyroscope_z_DegPerS mean_magnetometer_direction Activity fake_METs
#> 1 0.3059072 S Sitting 1.853070
#> 2 0.3203365 S Tennis 7.402845
#> 3 0.3101393 S Running 5.903824
#> 4 0.3001943 S Basketball 2.685811
#> 5 0.3083593 S Running 6.890708
#> 6 0.3191014 S Sitting 1.253249
#> ENMO_CV10s Classification METs
#> 1 13.3261509 intermittent 1.85306
#> 2 0.6433354 SB 1.25000
#> 3 0.6433354 SB 1.25000
#> 4 0.6433354 SB 1.25000
#> 5 0.6433354 SB 1.25000
#> 6 0.6433354 SB 1.25000
When making predictions, you can specify verbose = TRUE
if you want to print a message to the console about making predictions
from your model. By default, it will say it’s making predictions using
the ‘user_unspecified’ model. To give your model a name, you can assign
a value to its method
element. Consider the following:
<- predict(my_model, all_data, verbose = TRUE)
results_default #>
#> Calculating EE using the 'user_unspecified' two-regression model
$method <- "My Customized 2RM"
my_model
<- predict(my_model, all_data, verbose = TRUE)
results_updated #>
#> Calculating EE using the 'My Customized 2RM' two-regression model
And, of course, you can collapse the estimates to a particular time
granularity using smooth_2rm
like this:
## This code illustrates collapsing every 60 seconds. (This is the default
## period and also the typical recommendation, but you could do anything,
## e.g., "10 sec", "30 sec", or "0.25 hour")
::smooth_2rm(results_updated, "Timestamp", "60 sec")
TwoRegression#> Timestamp PID mean_magnetometer_direction Activity ENMO
#> 1 2015-11-12 15:45:00 Test1 S Sitting 3955.371
#> 2 2015-11-12 15:46:00 Test1 S Basketball 3971.312
#> 3 2015-11-12 15:47:00 Test1 S Walking 3970.534
#> 4 2015-11-12 15:48:00 Test2 S Basketball 3929.218
#> 5 2015-11-12 15:49:00 Test2 S Running 3771.568
#> Gyroscope_VM_DegPerS mean_abs_Gyroscope_x_DegPerS
#> 1 41.23766 13.79911
#> 2 41.46834 13.82252
#> 3 41.77129 14.04145
#> 4 41.99182 14.09534
#> 5 41.40131 13.84211
#> mean_abs_Gyroscope_y_DegPerS mean_abs_Gyroscope_z_DegPerS ENMO_CV10s
#> 1 33.78546 18.90326 0.6910839
#> 2 33.91097 19.15887 0.4586548
#> 3 34.32867 18.90991 0.5657218
#> 4 34.66010 18.75561 0.5590702
#> 5 34.23208 18.42478 0.6726754
#> SB_epochs walkrun_epochs intermittent_epochs fake_METs METs
#> 1 28 26 6 3.452270 2.398673
#> 2 8 35 17 3.521533 3.139028
#> 3 10 19 31 3.186677 3.128555
#> 4 41 4 15 3.882197 2.001488
#> 5 59 0 0 3.610416 1.250000
That’s it. This has been a quick crash course in the core features and functions of the TwoRegression package. If you have questions or feedback, feel free to connect by posting an issue on the TwoRegression GitHub page. Happy coding!