Averaging of profiles and retrieval of distributions

A data exploration tool for averaging and accessing large data sets of snow stratigraphy profiles useful for avalanche forecasting

This vignette creates a link between the methods described in the paper

Herla, F., Haegeli, P., and Mair, P.: A data exploration tool for averaging and accessing large data sets of snow stratigraphy profiles useful for avalanche forecasting, The Cryosphere, https://doi.org/10.5194/tc-2022-29, accepted.

and this companion R package. While the basic workflow of matching layers between pairs of snow stratigraphy profiles and thereby aligning them is described in the vignette Basic workflow, this vignette describes how larger groups of profiles can be aggregated into a representative summary profile (average profile) and how underlying distributions of layer or profile properties can be queried through the average profile.

library(sarp.snowprofile.alignment)
#> Loading required package: sarp.snowprofile

To demonstrate how useful aggregations and distributions of snow profiles can be computed with our R package, we use two very simplistic and brief example data sets of snow profiles: SPgroup2 and SPspacetime. While the results of these simplified examples will not look overly interesting, these examples teach you how to apply our tools to your own more challenging data sets.

As detailed in the reference paper, this package includes two different averaging algorithms for snow profiles, one that can be applied to any group of profiles (default algorithm) and one that can be applied to seasonal data sets of profiles (timeseries algorithm).

1. Default algorithm

1.1 Averaging of profiles

The default averaging algorithm can be accessed conveniently through the function averageSP and is implemented exactly as described in the reference paper. You can also consult the documentation (?averageSP) for more details.

## compute the average profile (here with default settings)
avgSP <- averageSP(SPgroup2)
## plot profile set and resulting average profile
plot(SPgroup2, SortMethod = 'hs', xticklabels = 'originalIndices')
plot(avgSP$avg)

Note: To speed up the computation in this vignette, the above figure was not computed with default settings in the background, but with a limited number of initial conditions. The displayed result might not represent the optimal average profile, so recompute it yourself with the default settings displayed above, or experiment with other settings!

It might be worthwhile to reflect on which layers you consider weak layers. With the function argument classifyPWLs you can make the averaging algorithm consider your perspective on weak layers and make them more likely to get included in the resulting profile. One approach (the default) could be to consider all surface/depth hoar layers as weak layers. Another approach could be to consider all persistent grain types that have one particular stability index above or below a certain threshold. Or you can also consider multiple different stability indices. Make sure to consult the documentation for some more details and tips.

1.2 Retrieval of distributions: Backtracking of layers

Since all layers of all individual profiles are matched against the layers of the average profile, we can backtrack each layer of the average profile to obtain all underlying layers that were matched against it. This allows us to compute various layer distributions or profile distributions as visualized in Figure 2 of the reference paper:

To compute an underlying distribution for the average example profile from above, we have to call the function backtrackLayers. You can either backtrack all layers of the average profile, or select specific layers of your interest.

Backtracking individual layers

In the next example, let’s only backtrack the surface hoar layer that is buried just below the new snow in the average example profile:

## identify layer of interest: we need its row index
deepestSH_index <- min(findPWL(avgSP$avg, pwl_gtype = "SH"))  # this is the deepest SH of the profile, in our specific case exactly what we want
## alternatively, you can (additionally) query for its date, 
## or you can identify it manually by investigating the profile layers data frame:
# View(avgSP$avg$layers)

## backtrack this one layer
backtrackedlayers <- backtrackLayers(avgSP$avg, layer = deepestSH_index, profileSet = avgSP$set)
## analyze result:
str(backtrackedlayers)  
#> List of 1
#>  $ 108.5:'data.frame':   5 obs. of  29 variables:
#>   ..$ station        : int [1:5] 77561 80858 75907 84723 82519
#>   ..$ datetime       : POSIXct[1:5], format: "2019-01-21 17:00:00" "2019-01-21 17:00:00" ...
#>   ..$ elev           : int [1:5] 1979 1859 1845 2037 1871
#>   ..$ band           : chr [1:5] "TL" "TL" "TL" "TL" ...
#>   ..$ angle          : num [1:5] 0 0 0 0 0
#>   ..$ aspect         : num [1:5] 0 0 0 0 0
#>   ..$ hs             : num [1:5] 111 114 142 131 118
#>   ..$ station_id     : chr [1:5] "VIR077561" "VIR080858" "VIR075907" "VIR084723" ...
#>   ..$ date           : Date[1:5], format: "2019-01-21" "2019-01-21" ...
#>   ..$ depth          : num [1:5] 6.5 8.5 9 9.5 9.5
#>   ..$ height         : num [1:5] 104 106 134 122 108
#>   ..$ thickness      : num [1:5] 0.5 0.5 0.5 0.5 0.5
#>   ..$ ddate          : POSIXct[1:5], format: "2019-01-17 15:30:00" "2019-01-16 09:30:00" ...
#>   ..$ gtype          : Factor w/ 9 levels "DF","DH","FC",..: 7 2 3 7 3
#>   ..$ hardness       : num [1:5] 1.36 1 1 1.12 1
#>   ..$ gsize          : num [1:5] 1.24 1.55 1.32 3.65 1.31
#>   ..$ density        : num [1:5] 171 113 100 140 95
#>   ..$ sphericity     : num [1:5] 0 0.09 0.27 0 0.25
#>   ..$ v_strain_rate  : num [1:5] -1 -0.2 -0.5 -0.4 -0.5
#>   ..$ rta            : num [1:5] 0.69 0.7 0.58 0.99 0.67
#>   ..$ sk38           : num [1:5] 6 6 6 6 6
#>   ..$ shear_strength : num [1:5] 0.31 0.22 0.17 0.32 0.16
#>   ..$ slab_rho       : num [1:5] 120 107 129 96 112
#>   ..$ slab_rhogs     : num [1:5] 241 211 280 199 241
#>   ..$ crit_cut_length: num [1:5] 0.1 0.08 0.14 0.03 0.11
#>   ..$ p_unstable     : num [1:5] 0.64 0.6 0.78 0.63 0.7
#>   ..$ bdate          : POSIXct[1:5], format: "2019-01-18 10:30:00" "2019-01-18 05:15:00" ...
#>   ..$ datetag        : Date[1:5], format: "2019-01-18" "2019-01-18" ...
#>   ..$ layerOfInterest: logi [1:5] TRUE TRUE FALSE TRUE FALSE
## the backtrackedLayers object is a list of data frames, one data frame for each averaged layer (in our case 1); 
## list elements are named by the height (cm) of the averaged layer
## compute whatever distribution you're interested: e.g., depth histogram
par(cex = 0.3)
hist(backtrackedlayers[[1]]$depth, 
     main = "Depth distribution of deepest SH layer", xlab = "Depth (cm)")

Backtracking all layers simultaneously

This is most meaningful if you’re interested in the stability distributions of all layers.

## backtrack all layers by not providing a row index
backtrackedlayers <- backtrackLayers(avgSP$avg, profileSet = avgSP$set)
## identify proportion of underlying layers with good/fair/poor stability 
## (here with RTA, but feel free to choose any other index or thresholds!)
transitional <- sapply(backtrackedlayers, function(bti) {
  sum(bti$rta >= 0.6)/ length(bti$rta)
  })  # proportion of layers fair stability
poor <- sapply(backtrackedlayers, function(bti) {
  sum(bti$rta >= 0.8)/ length(bti$rta)
  })  # proportion of layers poor stability

Currently, visualizing the stability distributions for the entire profile needs to be done by hand because no convenient wrapper function exists yet. In the following you get an idea of how that can be done:

## visualize profile
layout(matrix(c(2, 1), 1, 2, byrow = TRUE), c(1.2, 1.8))
par(mar = c(9.1, 0, 2.1, 2.1), bg = "transparent", cex = 0.3)
plot(avgSP$avg, axes = FALSE, xlab = "")
axis(1, at = seq(5), labels = c("F", "4F", "1F", "P", "K"))
mtext("Hardness", side = 1, line = 5.5, cex = 0.3)
##
## stacked barplot for stability distributions
par(mar = c(9.1, 5.1, 2.1, 0))
x0 <- 0
ygrid <- as.numeric(names(backtrackedlayers))  # the names of the list define the height grid
cols <- c("gray90", "gray70", "gray20")
stackedhist <- rbind(data.frame(xleft = 0, xright = 1-transitional, ytop = ygrid, 
                                ybottom = c(0, ygrid[1:(length(ygrid)-1)]), col = cols[1]),
                     data.frame(xleft = 1-transitional, xright = (1-transitional) + transitional, 
                                ytop = ygrid, ybottom = c(0, ygrid[1:(length(ygrid)-1)]), col = cols[2]),
                     data.frame(xleft = 1-poor, xright = 1, ytop = ygrid, 
                                ybottom = c(0, ygrid[1:(length(ygrid)-1)]), col = cols[3]))
plot(ygrid, xlim = c(0, 1), ylim = c(0, max(ygrid)), frame.plot = FALSE, type = 'n', 
     axes = FALSE, xlab = "", ylab = "Height (cm)")
mtext("Proportion of  \nindividual profiles", side = 1, line = 5.5, cex = 0.3)
rect(xleft = x0+stackedhist$xleft, ybottom = stackedhist$ybottom, 
     xright = x0+stackedhist$xright, ytop = stackedhist$ytop, 
     col = stackedhist$col, border = NA)
xaxisticks <- c(0, 0.2, 0.5, 0.8, 1)
axis(1, at = xaxisticks, labels = rev(xaxisticks))
axis(2, at = pretty(ygrid))
abline(v = 0.2, lty = "dotted", col = "gray")
abline(v = 0.5, lty = "dotted", col = "gray")
abline(v = 0.8, lty = "dotted", col = "gray")

Admittedly, the distribution looks very pixelated, but that’s solely due to the few number of underlying profiles (it’s 5 underlying profiles in our case here). Grab a data set of yours and try it yourself with more profiles!

2. Timeseries algorithm

To run the timeseries algorithm, you need spatially distributed snow profiles from consecutive days of a season in the form of a snowprofileSet object. So far, temporal sampling needs to be at 1 day, or in other words for every grid point you need one profile per day. If that’s not sufficient for you, please reach out, so that we can potentially tweak these requirements for you.

The timeseries averaging algorithm is accessible through the function averageSPalongSeason. If you provide the function with your profile set, it will start averaging all profiles from the first day and then iterate through the time range available within your profile set. If you want to create an average time series on a day-by-day basis as the season progresses, you can provide the average profile from the day before as initial condition to save lots of computing time.

While labeling of weak layers is optional in the default algorithm, because a simple default approach has been implemented, for various reasons the user needs to label weak layers herself before calling the timeseries algorithm:

## labeling of weak layers; again you can choose your own rules and thresholds!
SPspacetime <- snowprofileSet(lapply(SPspacetime, function(sp) {
  labelPWL(sp, pwl_gtype = c("SH", "DH", "FC", "FCxr"), threshold_RTA = 0.8)
  }))  # label weak layers in each profile of the profile set 'SPspacetime'

## average along several days
avgSP <- averageSPalongSeason(SPspacetime)
## explore the average timeseries object:
names(avgSP)
#> [1] "avgs" "sets" "call" "meta"
avgSP$call
#> averageSPalongSeason(SPx = SPspacetime)
avgSP$meta
#>         date reinitialized      rmse        hs hs_median thicknessPPDF_median
#> 1 2018-12-09          TRUE 0.1951225  78.50000  78.52925                 1.00
#> 2 2018-12-10         FALSE 0.3246647  81.03765  81.03765                 3.50
#> 3 2018-12-11         FALSE 0.2910004  88.39690  88.39690                12.75
#> 4 2018-12-12         FALSE 0.2760269  95.65220  95.65220                20.50
#> 5 2018-12-13         FALSE 0.2862271 107.57830 107.57830                36.25

## visualize the time series
par(cex = 0.3)
plot(avgSP$avgs, main = "Timeseries of average profile with median HS and median new snow amounts highligted", xlab = "Daily progression")
lines(avgSP$meta$date, avgSP$meta$hs_median)
lines(avgSP$meta$date, avgSP$meta$hs - avgSP$meta$thicknessPPDF_median, lty = "dashed")

The median height or depth of predominant layers will be close to the height or depth as displayed by the average time series. However, since the time series also displays the median snow height or the median new snow amounts, the displayed height or depth of individual layers will not always be correct. If you’re interested in the actual median height/depth, you can look up the layer properties medianPredominantHeight, medianPredominantDepth and visualize them in the time series with the help of the following brief helper function:

## brief helper function for median vertical locations of specific layers (i.e., height or depth)
medianVLOC <- function(avgObj, pwldate, vloc = "Depth", pwlgt = c("SH", "DH"), date_range_earl = -5, draw = TRUE) {
  mvl <- unname(do.call("c", lapply(avgObj$avgs, function(avg) {
    median(avg$layers[, paste0("medianPredominant", vloc)][findPWL(avg, pwl_gtype = pwlgt, pwl_date = pwldate, 
                                                                   date_range_earlier = as.difftime(date_range_earl, units = "days"))])
  })))
  if (draw) {
    if (vloc == "Depth") lines(avgObj$meta$date, avgObj$meta$hs_median - mvl, lty = "dotted", lwd = 2)
    else if (vloc == "Height") lines(avgObj$meta$date, mvl, lty = "dotted", lwd = 2)
  }
  return(mvl)
}

## plot time series...
par(cex = 0.3)
plot(avgSP$avgs, main = "Time series with median depth of middle Nov 22 DH layer highlighted", xlab = "Daily progression")
## ... and apply above function to the Nov 22 weak layer
medianDepth_NOV22 <- medianVLOC(avgSP, "2018-11-23", vloc = "Depth")

2.1 Visualizing layer distributions in the timeseries

Analogously to the default algorithm, you can backtrack layers in the timeseries application. Since you will have a lot more profiles and layers to backtrack you need to account for more computation time though. To speed up cases that we find particularly insightful, the timeseries object will already contain a few of these backtracked distributions, for example the distributions of layer stabilities as diagnosed by the index p_unstable (Mayer et al, 2022). You can find that distribution as layer property ppu (proportion p_unstable) and ppu_all. While ppu contains the stability distribution of the predominant layer, ppu_all contains the stability distribution of all layers that were matched against an average layer.

To make the package as customizable as possible, you can plot a time series of snow profiles based various different properties (see ?plot.snowprofileSet). Most often we personally use grain types to visualize sets of profiles. However, besides various different stability indices, you can also plot percentages of distributions. Since we cannot anticipate what kind of distributions you will be interested in, you have to rename your distribution of interest into a layer property percentage to visualize it conveniently. For the stability distribution of p_unstable, this can be done as follows:

## rename the variable ppu_all to 'percentage' for subsequent plotting
avgSP$avgs <- snowprofileSet(lapply(avgSP$avgs, function(avg) {
  avg$layers$percentage <- avg$layers$ppu_all
  avg
}))

## overplot the grain type time series with the stability distribution
par(cex = 0.3)
plot(avgSP$avgs[avgSP$meta$date>="2018-09-20"], colAlpha = 0.5, main = "time series of average profile overplotted with stability distribution", xlab = "Daily progression")
plot(avgSP$avgs[avgSP$meta$date>="2018-09-20"], ColParam = "percentage", add = TRUE)

Again, the resulting figure of our demo example looks pixelated and not really helpful, but if you do this for an entire season of profiles from a proper forecast region, you will end up with a time series as shown in Figure 4 of the reference paper:

3. Next steps

Now grab your own data sets of snow profiles and try these examples yourself. Always remember that each function has a proper documentation that you can consult, and most of the functions are implemented highly customizable. If you happen to need even more of a template, you can go to the Data and Code repository of the reference paper at [https://osf.io/7ma6g/] and follow the calculations of the paper step-by-step. And finally, if you run into bugs, please excuse us and reach out to us so that we can fix them!