The TwoRegression Package

Paul R. Hibbing

INTRODUCTION

The TwoRegression package allows users to quickly and accurately develop/apply two-regression algorithms to data from research-grade wearable devices. This vignette is designed to demonstrate usage of the package’s core features.

Before getting into that, it’s valuable to cover some history. The package was initially established as a home for the models of Hibbing et al. (2018). Since the initial release, support has been added for developing/applying new models, as well as applying others from prior research (see Crouter et al.(2006), Crouter et al.(2010), and Crouter et al.(2012)). As of version 1.0.0, a new approach has been implemented for invoking prior methods via the TwoRegression function, which we will look at in the following section. Afterwards, we will cover the process of creating and cross-validating new models, plus other aspects of using them effectively.

IMPLEMENTING TWO-REGRESSION MODELS THAT ALREADY EXIST

Approach and Options

Prior models are implemented using the TwoRegression function. Currently, support is available for the following:

Help documentation

It’s very important that you look at the TwoRegression function documentation. It will help you understand what settings you need to provide in order to run a specific model correctly. To view the documentation, run the following:


?TwoRegression::TwoRegression

This will pull up a documentation page where you can see the syntax for calling the TwoRegression function. Importantly, the page also lists the syntax for several internal applicators (i.e., crouter_2006, crouter_2010, crouter_2012, and hibbing_2018), which are the functions that actually do the work of applying your selected model. That is, the TwoRegression function is just a wrapper around those other internal functions, and based on the method you select, TwoRegression will call out to the corresponding applicator. In most cases, you will need to designate some extra settings for the applicator, which is why the syntax is listed in the documentation file alongside the TwoRegression syntax. Arguments for the internal functions can be passed into the TwoRegression function directly, as if they were arguments to that function itself. This will be easier to see and understand in the coding samples later on in this section, but it’s important to be aware of all this from the get-go.

Assumptions and Data

The TwoRegression function operates under the assumption you already have data read into R. You can do this with the AGread package.


if (!"remotes" %in% installed.packages)
  install.packages("remotes")

if (!"AGread" %in% installed.packages())
  remotes::install_github("paulhibbing/AGread")

For the sake of this illustration, the TwoRegression package provides some sample data we can use. If you can get your own data into a form that mirrors the sample data below, you’ll be in good shape. Here’s how to access it:


data(count_data, package = "TwoRegression")
data(all_data, package = "TwoRegression")

The count_data object contains activity count data (for the Crouter two-regression models), while the all_data object contains raw sensor data (for the Hibbing models). We can view the first few rows of count data as follows:


utils::head(count_data)
#>                  time Axis1 Axis2 Axis3 Vector.Magnitude
#> 1 2015-11-12 15:45:00    71   284   287              410
#> 2 2015-11-12 15:45:10     0     0     0                0
#> 3 2015-11-12 15:45:20     0     0     0                0
#> 4 2015-11-12 15:45:30     0     0     0                0
#> 5 2015-11-12 15:45:40     0     0     0                0
#> 6 2015-11-12 15:45:50     0     0     0                0

We can do the same using a similar approach for the raw data. However, let’s remove some extraneous variables first.


all_data <- all_data[ ,setdiff(
  names(all_data),
  ## These are the variables to remove:
  c(
    "file_source_PrimaryAccel", "date_processed_PrimaryAccel",
    "file_source_IMU", "date_processed_IMU", "day_of_year", "minute_of_day",
    ## Remove the following because they'll be recalculated later
    "ENMO_CV10s", "GVM_CV10s", "Direction"
  )
)]

utils::head(all_data)
#>    PID           Timestamp     ENMO Gyroscope_VM_DegPerS
#> 1 Test 2015-11-12 15:45:00 91.81064            0.6985270
#> 2 Test 2015-11-12 15:45:01 63.90690            0.6976330
#> 3 Test 2015-11-12 15:45:02 63.70351            0.6806886
#> 4 Test 2015-11-12 15:45:03 64.24208            0.6806373
#> 5 Test 2015-11-12 15:45:04 63.70059            0.6926852
#> 6 Test 2015-11-12 15:45:05 63.58346            0.6989126
#>   mean_abs_Gyroscope_x_DegPerS mean_abs_Gyroscope_y_DegPerS
#> 1                    0.2207365                    0.5846348
#> 2                    0.2344190                    0.5713514
#> 3                    0.2176725                    0.5628797
#> 4                    0.2382042                    0.5606505
#> 5                    0.2274410                    0.5743000
#> 6                    0.2290555                    0.5746244
#>   mean_abs_Gyroscope_z_DegPerS mean_magnetometer_direction
#> 1                    0.3059072                           S
#> 2                    0.3203365                           S
#> 3                    0.3101393                           S
#> 4                    0.3001943                           S
#> 5                    0.3083593                           S
#> 6                    0.3191014                           S

Applying Crouter models (for activity count data)

Once you have your dataset ready, it’s easy to apply a two-regression model. Just invoke the TwoRegression function like this:


crouter2006_results <- TwoRegression::TwoRegression(
  count_data, "Crouter 2006", movement_var = "Axis1", time_var = "time"
)

crouter2010_results <- TwoRegression::TwoRegression(
  count_data, "Crouter 2010", movement_var = "Axis1", time_var = "time"
)

crouter2012_va_results <- TwoRegression::TwoRegression(
  count_data, "Crouter 2012", movement_var = "Axis1",
  time_var = "time", model = "VA", check = FALSE
)

crouter2012_vm_results <- TwoRegression::TwoRegression(
  count_data, "Crouter 2012", movement_var = "Vector.Magnitude",
  time_var = "time", model = "VM", check = FALSE
)

For the Crouter 2012 models, you have to choose between the vertical axis model and the vector magnitude model. If you don’t set check = FALSE, you will get a warning about which movement variable and model you’ve selected. This is meant as a prompt for you to ensure your selected movement variable matches your selected model. Once you’re confident in your selection, you can set check = FALSE and the warning won’t show up.

For the time being, you can only implement Crouter models one at a time. Of course, you can combine the output from multiple models yourself. Ideally, with ongoing development, a point will come where this can be done automatically and efficiently (see the GitHub issue on this topic), but for now it isn’t built in. As we’ll see in the following subsection, though, it is doable for the Hibbing models. Here’s a look at the output from the prior commands:


utils::head(crouter2006_results)
#>                  time Axis1 Axis2 Axis3 Vector.Magnitude   cv_10 Classification
#> 1 2015-11-12 15:45:00    71   284   287              410 244.949   intermittent
#> 2 2015-11-12 15:46:00     0     0     0                0   0.000             SB
#> 3 2015-11-12 15:47:00     0     0     0                0   0.000             SB
#> 4 2015-11-12 15:48:00     0     0     0                0   0.000             SB
#> 5 2015-11-12 15:49:00     0     0     0                0   0.000             SB
#>       METs
#> 1 2.446791
#> 2 1.000000
#> 3 1.000000
#> 4 1.000000
#> 5 1.000000
utils::head(crouter2010_results)
#>                  time Axis1 Axis2 Axis3 Vector.Magnitude mean_cv_10 SB_epochs
#> 1 2015-11-12 15:45:00    71   284   287              410   40.82483         5
#> 2 2015-11-12 15:46:00     0     0     0                0    0.00000         6
#> 3 2015-11-12 15:47:00     0     0     0                0    0.00000         6
#> 4 2015-11-12 15:48:00     0     0     0                0    0.00000         6
#> 5 2015-11-12 15:49:00     0     0     0                0    0.00000         6
#>   walkrun_epochs intermittent_epochs    METs
#> 1              0                   1 1.28234
#> 2              0                   0 1.00000
#> 3              0                   0 1.00000
#> 4              0                   0 1.00000
#> 5              0                   0 1.00000
utils::head(crouter2012_va_results)
#>                  time Axis1 Axis2 Axis3 Vector.Magnitude mean_cv_10 SB_epochs
#> 1 2015-11-12 15:45:00    71   284   287              410   40.82483         5
#> 2 2015-11-12 15:46:00     0     0     0                0    0.00000         6
#> 3 2015-11-12 15:47:00     0     0     0                0    0.00000         6
#> 4 2015-11-12 15:48:00     0     0     0                0    0.00000         6
#> 5 2015-11-12 15:49:00     0     0     0                0    0.00000         6
#>   walkrun_epochs intermittent_epochs    METs
#> 1              0                   1 1.34108
#> 2              0                   0 1.00000
#> 3              0                   0 1.00000
#> 4              0                   0 1.00000
#> 5              0                   0 1.00000
utils::head(crouter2012_vm_results)
#>                  time Axis1 Axis2 Axis3 Vector.Magnitude mean_cv_10 SB_epochs
#> 1 2015-11-12 15:45:00    71   284   287              410   40.82483         5
#> 2 2015-11-12 15:46:00     0     0     0                0    0.00000         6
#> 3 2015-11-12 15:47:00     0     0     0                0    0.00000         6
#> 4 2015-11-12 15:48:00     0     0     0                0    0.00000         6
#> 5 2015-11-12 15:49:00     0     0     0                0    0.00000         6
#>   walkrun_epochs intermittent_epochs     METs
#> 1              0                   1 1.368691
#> 2              0                   0 1.000000
#> 3              0                   0 1.000000
#> 4              0                   0 1.000000
#> 5              0                   0 1.000000

Applying Hibbing models (for raw sensor data)

The Hibbing models are implemented similarly to the Crouter models. A key difference, though, is that you can ask the function to run multiple models simultaneously. That’s what we’ll see in the following example:


hibbing2018_results <- TwoRegression::TwoRegression(
  
  all_data, "Hibbing 2018", accel_var = "ENMO",
  gyro_var = "Gyroscope_VM_DegPerS",
  direction_var = "mean_magnetometer_direction",
  
  ## Here is where we can select an algorithm from multiple sites:
  site = c("Left Ankle", "Right Ankle"),
  
  ## And here is where we can select multiple algorithms
  ## (1 = accelerometer only; 2 = accelerometer and gyroscope;
  ## 3 = accelerometer, gyroscope, and magnetometer)
  algorithm = 1:2,
  
  ## We can also ask the function to collapse data every minute by making an
  ## extra call to `smooth_2rm`
  smooth = TRUE
)

utils::head(hibbing2018_results)
#>             Timestamp  PID mean_magnetometer_direction     ENMO
#> 1 2015-11-12 15:45:00 Test                           S 3955.371
#> 2 2015-11-12 15:46:00 Test                           S 3971.312
#> 3 2015-11-12 15:47:00 Test                           S 3970.534
#> 4 2015-11-12 15:48:00 Test                           S 3929.218
#> 5 2015-11-12 15:49:00 Test                           S 3771.568
#>   Gyroscope_VM_DegPerS mean_abs_Gyroscope_x_DegPerS
#> 1             41.23766                     13.79911
#> 2             41.46834                     13.82252
#> 3             41.77129                     14.04145
#> 4             41.99182                     14.09534
#> 5             41.40131                     13.84211
#>   mean_abs_Gyroscope_y_DegPerS mean_abs_Gyroscope_z_DegPerS ENMO_CV10s
#> 1                     33.78546                     18.90326  0.6910839
#> 2                     33.91097                     19.15887  0.4586548
#> 3                     34.32867                     18.90991  0.5657218
#> 4                     34.66010                     18.75561  0.5590702
#> 5                     34.23208                     18.42478  0.6726754
#>   GVM_CV10s Left_Ankle_Algorithm1_SB_epochs
#> 1 0.8023038                               0
#> 2 0.7353243                               0
#> 3 0.7774376                               0
#> 4 0.7712045                               0
#> 5 0.7228010                               0
#>   Left_Ankle_Algorithm1_walkrun_epochs
#> 1                                   60
#> 2                                   60
#> 3                                   60
#> 4                                   60
#> 5                                   59
#>   Left_Ankle_Algorithm1_intermittent_epochs Right_Ankle_Algorithm1_SB_epochs
#> 1                                         0                                0
#> 2                                         0                                0
#> 3                                         0                                0
#> 4                                         0                                0
#> 5                                         0                                0
#>   Right_Ankle_Algorithm1_walkrun_epochs
#> 1                                    60
#> 2                                    60
#> 3                                    60
#> 4                                    60
#> 5                                    59
#>   Right_Ankle_Algorithm1_intermittent_epochs Left_Ankle_Algorithm2_SB_epochs
#> 1                                          0                              60
#> 2                                          0                              60
#> 3                                          0                              60
#> 4                                          0                              60
#> 5                                          0                              59
#>   Left_Ankle_Algorithm2_walkrun_epochs
#> 1                                    0
#> 2                                    0
#> 3                                    0
#> 4                                    0
#> 5                                    0
#>   Left_Ankle_Algorithm2_intermittent_epochs Right_Ankle_Algorithm2_SB_epochs
#> 1                                         0                               60
#> 2                                         0                               60
#> 3                                         0                               60
#> 4                                         0                               60
#> 5                                         0                               59
#>   Right_Ankle_Algorithm2_walkrun_epochs
#> 1                                     0
#> 2                                     0
#> 3                                     0
#> 4                                     0
#> 5                                     0
#>   Right_Ankle_Algorithm2_intermittent_epochs Left_Ankle_Algorithm1_METs
#> 1                                          0                   2.732142
#> 2                                          0                   2.733321
#> 3                                          0                   2.733263
#> 4                                          0                   2.730209
#> 5                                          0                   2.723279
#>   Right_Ankle_Algorithm1_METs Left_Ankle_Algorithm2_METs
#> 1                    2.463901                       1.25
#> 2                    2.465122                       1.25
#> 3                    2.465063                       1.25
#> 4                    2.461897                       1.25
#> 5                    2.454716                       1.25
#>   Right_Ankle_Algorithm2_METs
#> 1                        1.25
#> 2                        1.25
#> 3                        1.25
#> 4                        1.25
#> 5                        1.25

So, each algorithm is run, and the information is stored in a unique and descriptive variable name.

FITTING AND EXAMINING NEW MODELS

Background and Setup

The TwoRegression package is also useful if you want to create your own model. To get this going, though, your dataset needs to have some more complex information in it. We’ll use our previous all_data object in this illustration. First, we need to label it with some pretend activity labels and energy expenditure values (METs). In a real-life setting, the MET values would likely come from indirect calorimetry. To create some of this imaginary data, we can run the following:


set.seed(307)

fake_sed <- c("Lying", "Sitting")
fake_lpa <- c("Sweeping", "Dusting")
fake_cwr <- c("Walking", "Running")
fake_ila <- c("Tennis", "Basketball")

fake_activities <- c(fake_sed, fake_lpa, fake_cwr, fake_ila)

all_data$Activity <- sample(fake_activities, nrow(all_data), TRUE)

all_data$fake_METs <- ifelse(
  all_data$Activity %in% c(fake_sed, fake_lpa),
  runif(nrow(all_data), 1, 2),
  runif(nrow(all_data), 2.5, 8)
)

For this demonstration, a couple of extra hacks are needed, which would be much more natural to handle with real data. Still, they’re helpful to see. First, we need to make sure our dataset has a column indicating which participant each data point came from. In this case, we’ll just label our data to pretend it came from two sample files instead of one (where ‘sample file’ is analogous to ‘participant’). The other step is calculating the coefficient of variation (CV). We technically could have avoided this by choosing not to delete the CV variables earlier. But that decision now gives us an excuse to show how convenient it is to calculate CV in the TwoRegression package.There were also some technical reasons for deleting the variables earlier, but nevermind that (see another GitHub issue if you’re curious).


all_data$PID <- rep(
  c("Test1", "Test2"),
  each = ceiling(nrow(all_data) / 2)
)[seq(nrow(all_data))]

all_data$ENMO_CV10s <- TwoRegression::cv_2rm(all_data$ENMO)

Fitting the Model

When we go to fit the model, we’ll use the fit_2rm function. There are a lot of arguments to provide here:

From there, we can fit our model like this:


my_model <- TwoRegression::fit_2rm(
  data = all_data,
  activity_var = "Activity",
  sed_cp_activities = c(fake_sed, fake_lpa),
  sed_activities = fake_sed,
  sed_cp_var = "ENMO",
  sed_METs = 1.25,
  walkrun_activities = fake_cwr,
  walkrun_cp_var = "ENMO_CV10s",
  met_var = "fake_METs",
  walkrun_formula = "fake_METs ~ ENMO",
  intermittent_formula = "fake_METs ~ ENMO + I(ENMO^2) + I(ENMO^3)"
)

Examining Model Performance

The package provides summary and plot methods to understand, cross-validate, and visualize the model. Notably, this demonstration model is not meant to perform well or look pretty (the data are just numbers that have no real meaning), but we’ll still take a look at how to run the code.

As far as the summary method goes, this is where we need the participant identification column we set up earlier. Specifically, it will be used for leave-one-out cross-validation, where the data are split up into different chunks while the model is repeatedly re-fitted. Other information in the output includes a textual representation of the overall algorithm and summaries of the fit/performance of individual components (i.e., ROC and regression analyses). To pull all of this up, you just have to run code that matches the following pattern:


summary(
  my_model,
  subject_var = "PID",
  MET_var = "fake_METs",
  activity_var = "Activity"
)
#> $cut_points
#> $cut_points$sedentary_ENMO
#>   threshold specificity sensitivity 
#>  65.7094366   0.5068493   0.5362319 
#> 
#> $cut_points$walk_run_ENMO
#>   threshold specificity sensitivity 
#>   0.4777372   0.5777778   0.5185185 
#> 
#> 
#> $regression_models
#> $regression_models$walk_run
#>            formula adj.r.squared      see
#> 1 fake_METs ~ ENMO   -0.01142529 2.264023
#> 
#> $regression_models$intermittent_activity
#>                                    formula adj.r.squared      see
#> 1 fake_METs ~ ENMO + I(ENMO^2) + I(ENMO^3)   -0.03106764 2.152466
#> 
#> 
#> $leave_one_out
#>       RMSE     MAPE
#> 1 4358.838 13675.04
#> 
#> $algorithm
#> [1] "If ENMO <= 65.71: METS = 1.25"                                                                                         
#> [2] "If ENMO > 65.71 AND ENMO_CV10s <= 0.48: METs = 18.336 - 0.226 (ENMO)"                                                  
#> [3] "If ENMO > 65.71 AND ENMO_CV10s > 0.48: METs = 5306.279 - 216.4804 (ENMO) +   2.9170 (I(ENMO^2)) -   0.0129 (I(ENMO^3))"

For the plot function, you’ll need to fill in some of the same values from the original call to fit_2rm. Use code that matches the following pattern:


## You have to explicitly type `object = ` for this to work
plot(
  object = my_model,
  sed_cp_activities = c(fake_sed, fake_lpa),
  sed_activities = fake_sed,
  sed_cpVar = "ENMO",
  activity_var = "Activity",
  met_var = "fake_METs",
  walkrun_activities = fake_cwr,
  walkrun_cpVar = "ENMO_CV10s",
  print = TRUE
)

USING NEW MODELS

Once you’ve created your model, you want to use it on new data. That’s easy to do using the predict method included in the package. If we pretend our all_data object is a new dataset, we could get predictions by running code like this:


new_results <- predict(my_model, all_data)

utils::head(new_results)
#>     PID           Timestamp     ENMO Gyroscope_VM_DegPerS
#> 1 Test1 2015-11-12 15:45:00 91.81064            0.6985270
#> 2 Test1 2015-11-12 15:45:01 63.90690            0.6976330
#> 3 Test1 2015-11-12 15:45:02 63.70351            0.6806886
#> 4 Test1 2015-11-12 15:45:03 64.24208            0.6806373
#> 5 Test1 2015-11-12 15:45:04 63.70059            0.6926852
#> 6 Test1 2015-11-12 15:45:05 63.58346            0.6989126
#>   mean_abs_Gyroscope_x_DegPerS mean_abs_Gyroscope_y_DegPerS
#> 1                    0.2207365                    0.5846348
#> 2                    0.2344190                    0.5713514
#> 3                    0.2176725                    0.5628797
#> 4                    0.2382042                    0.5606505
#> 5                    0.2274410                    0.5743000
#> 6                    0.2290555                    0.5746244
#>   mean_abs_Gyroscope_z_DegPerS mean_magnetometer_direction   Activity fake_METs
#> 1                    0.3059072                           S    Sitting  1.853070
#> 2                    0.3203365                           S     Tennis  7.402845
#> 3                    0.3101393                           S    Running  5.903824
#> 4                    0.3001943                           S Basketball  2.685811
#> 5                    0.3083593                           S    Running  6.890708
#> 6                    0.3191014                           S    Sitting  1.253249
#>   ENMO_CV10s Classification    METs
#> 1 13.3261509   intermittent 1.85306
#> 2  0.6433354             SB 1.25000
#> 3  0.6433354             SB 1.25000
#> 4  0.6433354             SB 1.25000
#> 5  0.6433354             SB 1.25000
#> 6  0.6433354             SB 1.25000

When making predictions, you can specify verbose = TRUE if you want to print a message to the console about making predictions from your model. By default, it will say it’s making predictions using the ‘user_unspecified’ model. To give your model a name, you can assign a value to its method element. Consider the following:


results_default <- predict(my_model, all_data, verbose = TRUE)
#> 
#> Calculating EE using the 'user_unspecified' two-regression model


my_model$method <- "My Customized 2RM"


results_updated <- predict(my_model, all_data, verbose = TRUE)
#> 
#> Calculating EE using the 'My Customized 2RM' two-regression model

And, of course, you can collapse the estimates to a particular time granularity using smooth_2rm like this:


## This code illustrates collapsing every 60 seconds. (This is the default
## period and also the typical recommendation, but you could do anything,
## e.g., "10 sec", "30 sec", or "0.25 hour")
TwoRegression::smooth_2rm(results_updated, "Timestamp", "60 sec")
#>             Timestamp   PID mean_magnetometer_direction   Activity     ENMO
#> 1 2015-11-12 15:45:00 Test1                           S    Sitting 3955.371
#> 2 2015-11-12 15:46:00 Test1                           S Basketball 3971.312
#> 3 2015-11-12 15:47:00 Test1                           S    Walking 3970.534
#> 4 2015-11-12 15:48:00 Test2                           S Basketball 3929.218
#> 5 2015-11-12 15:49:00 Test2                           S    Running 3771.568
#>   Gyroscope_VM_DegPerS mean_abs_Gyroscope_x_DegPerS
#> 1             41.23766                     13.79911
#> 2             41.46834                     13.82252
#> 3             41.77129                     14.04145
#> 4             41.99182                     14.09534
#> 5             41.40131                     13.84211
#>   mean_abs_Gyroscope_y_DegPerS mean_abs_Gyroscope_z_DegPerS ENMO_CV10s
#> 1                     33.78546                     18.90326  0.6910839
#> 2                     33.91097                     19.15887  0.4586548
#> 3                     34.32867                     18.90991  0.5657218
#> 4                     34.66010                     18.75561  0.5590702
#> 5                     34.23208                     18.42478  0.6726754
#>   SB_epochs walkrun_epochs intermittent_epochs fake_METs     METs
#> 1        28             26                   6  3.452270 2.398673
#> 2         8             35                  17  3.521533 3.139028
#> 3        10             19                  31  3.186677 3.128555
#> 4        41              4                  15  3.882197 2.001488
#> 5        59              0                   0  3.610416 1.250000

CONCLUSION

That’s it. This has been a quick crash course in the core features and functions of the TwoRegression package. If you have questions or feedback, feel free to connect by posting an issue on the TwoRegression GitHub page. Happy coding!