Tour de trackeR

Hannah Frick and Ioannis Kosmidis

2019-05-15

The trackeR package provides infrastructure for handling running and cycling data from GPS-enabled tracking devices. A short demonstration of its functionality is provided below, based on data from running activities. A more comprehensive introduction to the package can be found in the vignette “Infrastructure for Running and Cycling Data”, which can be accessed by typing

vignette("trackeR", package = "trackeR")

Reading data

trackeR can currently import files in the Training Centre XML (TCX) format and .db3 files (SQLite databases, used, for example, by devices from GPSports) through the corresponding functions readTCX() and readDB3(). It also offers support for JSON files from Golden Cheetah via readJSON().

library("trackeR")
filepath <- system.file("extdata/tcx/", "2013-06-01-183220.TCX.gz", package = "trackeR")
runDF <- readTCX(file = filepath, timezone = "GMT")

These read functions return a data.frame of the following structure

str(runDF)
#> 'data.frame':    3881 obs. of  11 variables:
#>  $ time           : POSIXct, format: "2013-06-01 17:32:20" "2013-06-01 17:32:21" ...
#>  $ latitude       : num  50.8 50.8 50.8 50.8 50.8 ...
#>  $ longitude      : num  -1.7 -1.7 -1.7 -1.7 -1.7 ...
#>  $ altitude       : num  83.4 83.8 84 83.8 83.6 ...
#>  $ distance       : num  1.26 3.3 7.12 11.12 16.76 ...
#>  $ heart_rate     : num  56 61 61 71 71 74 74 85 85 85 ...
#>  $ speed          : num  0.885 1.209 1.801 2.205 2.756 ...
#>  $ cadence_running: num  60 63 70 78 83 84 84 85 85 86 ...
#>  $ cadence_cycling: logi  NA NA NA NA NA NA ...
#>  $ power          : logi  NA NA NA NA NA NA ...
#>  $ temperature    : logi  NA NA NA NA NA NA ...
#>  - attr(*, "sport")= chr "running"
#>  - attr(*, "file")= chr "/private/var/folders/3t/00tlvfn14zq5v45q3q3y63cm0000gn/T/RtmpZ5Y4jK/Rinst47be3b7dbfee/trackeR/extdata/tcx//2013"| __truncated__

That data.frame can be used as an input to the constructor function for trackeR’s trackeRdata class, to produce a session-based and unit-aware object that can be used for further analyses.

runTr0 <- trackeRdata(runDF)
#> Warning in sanity_checks(dat = dat, silent = silent): Observations with
#> duplicated time stamps have been removed.

The read_container() function combines the two steps of importing the data and constructing the trackeRdata object.

runTr1 <- read_container(filepath, type = "tcx", timezone = "GMT")
#> Warning in sanity_checks(dat = dat, silent = silent): Observations with
#> duplicated time stamps have been removed.
identical(runTr0, runTr1)
#> [1] TRUE

The read_directory() function can be used to read all supported files in a directory and produce the corresponding trackeRdata objects.

Visualisations

The package includes an example data set which can be accessed through

data("runs", package = "trackeR")

The default behaviour of the plot method for trackeRdata objects is to show how heart rate and pace evolve over the session.

plot(runs, session = 1:7)

The elevation profile of a training session is also accessible, here along with the pace.

plot(runs, session = 8, what = c("altitude", "pace"))

The route taken during a training session can also be plotted on maps from various sources e.g., from Google or OpenStreetMap. This can be done either on a static map

tryCatch(plot_route(runs, session = 1, source = "stamen"),
         error = function(x) "Failed to donwload map data")

or on an interactive map.

tryCatch(leaflet_route(runs, session = c(1, 6, 12)),
         error = function(x) "Failed to donwload map data")

Session summaries

The summary of sessions includes basic statistics like duration, time spent moving, average speed, pace, and heart rate. The speed threshold used to distinguish moving from resting can be set by the argument moving_threshold.

summary(runs, session = 1, moving_threshold = c(cycling = 2, running = 1, swimming = 0.5))
#> 
#>  *** Session 1 : running ***
#> 
#>  Session times: 2013-06-01 18:32:15 - 2013-06-01 19:37:56 
#>  Distance: 14130.7 m 
#>  Duration: 65.68 mins 
#>  Moving time: 64.17 mins 
#>  Average speed: 3.59 m_per_s 
#>  Average speed moving: 3.67 m_per_s 
#>  Average pace (per 1 km): 4:38 min:sec
#>  Average pace moving (per 1 km): 4:32 min:sec
#>  Average cadence running: 88.66 steps_per_min 
#>  Average cadence cycling: NA rev_per_min 
#>  Average cadence running moving: 88.87 steps_per_min 
#>  Average cadence cycling moving: NA rev_per_min 
#>  Average power: NA W 
#>  Average power moving: NA W 
#>  Average heart rate: 141.11 bpm 
#>  Average heart rate moving: 141.13 bpm 
#>  Average heart rate resting: 136.76 bpm 
#>  Average temperature: NA C 
#>  Total elevation gain: 94.2 m 
#>  Work to rest ratio: 42.31 
#> 
#>  Moving thresholds: 2.0 (cycling) 1.0 (running) 0.5 (swimming) m_per_s 
#>  Unit reference sport: running

It is usually desirable to visualise summaries from multiple sessions. This can be done using the plot method for summary objects. Below, we produce such a plot for average heart rate, average speed, distance, and duration.

runs_summary <- summary(runs)
plot(runs_summary, group = c("total", "moving"),
     what = c("avgSpeed", "distance", "duration", "avgHeartRate"))

The timeline plot is useful to visualise the date and time that the sessions took place and provide information of their relative duration.

timeline(runs_summary)
#> Warning in `[<-.data.frame`(`*tmp*`, nl, value = list(sport =
#> c("running", : replacement element 1 has 27 rows to replace 26 rows

Time in zones

The time spent training in certain zones, e.g., speed zones, can also be calculated and visualised.

run_zones <- zones(runs[1:4], what = "speed", breaks = c(0, 2:6, 12.5))
plot(run_zones)

Quantifying work capacity via W’ (W prime)

trackeR can also be used to calculate and visualise the work capacity W’ (pronounced as W prime). The comprehensive vignette “Infrastructure for Running and Cycling Data” provides the definition of work capacity and details on the version and quantity arguments.

wexp <- Wprime(runs, session = 11, quantity = "expended", cp = 4, version = "2012")
plot(wexp, scaled = TRUE)

Distribution and concentration profiles

Kosmidis and Passfield (2015) introduce the concept of distribution and concentration profiles for which trackeR provides an implementation. These profiles are motivated by the need to compare sessions and use information on such variables as heart rate or speed during a session for further modelling.

The distribution profile for a variable such as speed or heart rate describes the time exercising above a (speed or heart rate) threshold.

Here, the distribution profiles for the first 4 sessions are calculated for speed with thresholds ranging from 0 to 12.5 m/s in increments of 0.05 m/s.

d_profile <- distribution_profile(runs, session = 1:4, what = "speed",
                                  grid = list(speed = seq(0, 12.5, by = 0.05)))
plot(d_profile, multiple = TRUE)

Sessions 4 and 1 are longer than session 2 and 3, as visible by the higher amount of time spent exercising above 0 m/s. Sessions 3 and 4 show a larger amount of time spent exercising above 4 m/s than the other sessions. This is easier to spot in the concentration profiles which are the negative derivative of the distribution profiles. The concentration profile for session 3 has a mode at around 3.5 meters per second and another one above 4 meters per second, showing that this session involved training at a combination of low and high speeds.

c_profile <- concentrationProfile(d_profile, what = "speed")
plot(c_profile, multiple = TRUE, smooth = TRUE)

More details on distribution and concentration profiles can be found in the comprehensive vignette “Infrastructure for Running and Cycling Data”.

Functional principal components analysis

The distribution and concentration can be used for further analysis such as a functional principal components analysis (PCA) to describe the differences between the profiles.

The concentration profiles for all session

runsT <- threshold(runs)
dp_runs <- distribution_profile(runsT, what = "speed")
dp_runs_S <- smoother(dp_runs)
cp_runs <- concentration_profile(dp_runs_S)
plot(cp_runs, multiple = TRUE, smooth = FALSE)

vary in their shape (unimodal or multimodal), height, and location (revealing concentrations at higher or lower speeds). The function funPCA() can be used to fit a functional PCA, here with 4 principal components.

cpPCA <- funPCA(cp_runs, what = "speed", nharm = 4)

For the speed concentration profiles here, the first two components cover 91% of the variability in the profiles.

round(cpPCA$varprop, 2)
#> [1] 0.66 0.25 0.06 0.02
plot(cpPCA, harm = 1:2)

A plot of the first two principal components reveals that the profiles here differ mostly in the general height of the curves and the location. These two aspects of the profiles can be captured by two univariate measures of the sessions, the time spent moving and the average speed moving.

## plot scores vs summary statistics
scoresSP <- data.frame(cpPCA$scores)
names(scoresSP) <- paste0("speed_pc", 1:4)
d <- cbind(runs_summary, scoresSP)

library("ggplot2")
## pc1 ~ session duration (moving)
ggplot(d) + geom_point(aes(x = as.numeric(durationMoving), y = speed_pc1)) + theme_bw()
## pc2 ~ avg speed (moving)
ggplot(d) + geom_point(aes(x = avgSpeedMoving, y = speed_pc2)) + theme_bw()

References

Kosmidis, Ioannis, and Louis Passfield. 2015. “Linking the Performance of Endurance Runners to Training and Physiological Effects via Multi-Resolution Elastic Net.”