The package builds on sarp.snowprofile
for basic data
input/output, profile manipulation and visualization.
This vignette demonstrates the workflow of the analysis methods described in detail in “Herla, F., Horton, S., Mair, P., and Haegeli, P.: Snow profile alignment and similarity assessment for aggregating, clustering, and evaluating of snowpack model output for avalanche forecasting, Geosci. Model Dev., https://doi.org/10.5194/gmd-14-239-2021, 2021.”
library(sarp.snowprofile.alignment)
Dynamic Time Warping (DTW) can be used to match layers between pairs of snow profiles so that one profile can be warped onto the other profile. That means that the warped profile, which contains the unchanged layer sequences, is optimally aligned to the second profile by adjusting its layer thicknesses.
The core functions for aligning profiles are
dtwSP
: calculate the alignment between two
profilesplotSPalignment
: plot the alignmentplotCostDensitySP
: inspect more details behind the
alignment with a cost density plot## Compute alignment:
<- dtwSP(SPpairs$A_modeled, SPpairs$A_manual, open.end = FALSE) dtwAlignment
## Plot alignment:
plotSPalignment(dtwAlignment = dtwAlignment)
Both functions dtwSP
and plotSPalignment
allow for controlling the settings of the alignment, such as matching of
subsequences or constraints to the warping path (i.e., open window size,
local slope constraint). See ?dtwSP
, or the vignette
Technical details for more information.
## Inspect local cost:
plotCostDensitySP(dtwAlignment)
The local cost matrix visualizes the distances (i.e., dissimilarities) between individual layer combinations and ultimately shapes the alignment of the profiles through the warping path of least resistance through the matrix.
The local cost matrix can be modified by changing the included layer characteristics (e.g., include/exclude layer date information), by changing the relative weights of the included layer characteristics, or by modifying the preferential layer matching implementation. Read the vignette Technical details for more information.
The package provides a function simSP
to calculate the
similarity between two aligned profiles based on considerations relevant
for avalanche hazard assessment. The resulting similarity ranges between
[0, 1]
. Note that the recent version of this package
contains several different approaches of how that similarity measure is
computed. See ?simSP
for more details.
The similarity score can be accessed from the alignment object via
$sim
. To use different methods for computing the score,
provide the corresponding parameter simType
to your
function call to dtwSP
. Otherwise, the score can be
calculated for a precomputed alignment object by
$sim <- simSP(dtwAlignment$reference, dtwAlignment$queryWarped, verbose = TRUE, type = "HerlaEtAl2021")
dtwAlignment#> wl cr pp bulk
#> sim [0, 1]: 0.84 1 0.85 0.69
#> simple similarity = 0.847
The above methods get even more interesting and useful when applied
to sets of snow profiles in order to aggregate or cluster them.
Therefore, the package provides medoidSP
, which identifies
the medoid profile among a set of profiles, and a wrapper function
distanceSP
(for dtwSP
and simSP
),
which returns the distance between two profiles (i.e., a common input
for clustering algorithms).
Note that this vignette describes approaches from the Herla et al (2021) paper cited above. Both tasks, aggregating and clustering, need to and will be improved in future work to appropriately address operational needs. Check out new vignettes as the methods improve.
Let’s start with a clustering demonstration of the sample data set
SPgroup
. You don’t need to, but it is recommended to
rescale the set of profiles to identical snow heights before computing a
pairwise distance matrix for the profile set.
## rescaling and resampling of the snow profiles:
<- reScaleSampleSPx(SPgroup)$set
setRR
## compute the pairwise distance matrix:
<- medoidSP(setRR, retDistmat = T, progressbar = FALSE, verbose = TRUE)$distmat
distmat #> You are about to compute 12**2 = 144 profile alignments. Be patient..
#> 1 2 3 4 5 6 7
#> 1 0.0000000 0.1728718 0.30474444 0.3932622 0.3200169 0.3375072 0.2741774
#> 2 0.1130027 0.0000000 0.27948126 0.4167719 0.2427858 0.2749057 0.2317675
#> 3 0.2328315 0.2966986 0.00000000 0.3463768 0.3902615 0.3746297 0.3044912
#> 4 0.4388049 0.4468132 0.35358889 0.0000000 0.5051648 0.4614169 0.4415197
#> 5 0.3375604 0.2368032 0.37629569 0.4957069 0.0000000 0.1433686 0.3444140
#> 6 0.3307160 0.2827186 0.39271205 0.4899647 0.1778283 0.0000000 0.3624286
#> 7 0.2366985 0.2419543 0.30443079 0.4417672 0.3400295 0.3376985 0.0000000
#> 8 0.2761450 0.3151659 0.09527891 0.2956350 0.3896890 0.2890942 0.3154665
#> 9 0.1853992 0.2616605 0.31613565 0.4729739 0.2893061 0.2663625 0.2759453
#> 10 0.3421837 0.3771436 0.30203032 0.3321895 0.4757033 0.3881517 0.3929348
#> 11 0.2245517 0.2778233 0.24038771 0.3597504 0.3409509 0.3386544 0.2713434
#> 12 0.3342179 0.3387808 0.28726746 0.4364509 0.3112002 0.3782750 0.3795790
#> 8 9 10 11 12
#> 1 0.31006879 0.1837075 0.2975668 0.2164958 0.3675254
#> 2 0.33213747 0.3056794 0.3140363 0.3062003 0.3338265
#> 3 0.09974265 0.3072903 0.1671733 0.2035110 0.3306579
#> 4 0.33348944 0.4233123 0.3079825 0.3649860 0.4393048
#> 5 0.38795719 0.3041066 0.3924646 0.3508924 0.4668084
#> 6 0.38979602 0.3287740 0.4285917 0.3218587 0.4435430
#> 7 0.32488480 0.2793745 0.3181877 0.2552203 0.3777790
#> 8 0.00000000 0.2869271 0.1175565 0.2223470 0.3902355
#> 9 0.30873016 0.0000000 0.3004763 0.1741321 0.2484749
#> 10 0.24176054 0.3748009 0.0000000 0.3266810 0.4165677
#> 11 0.25838444 0.2058994 0.2531987 0.0000000 0.1991200
#> 12 0.35399266 0.2254472 0.3164906 0.1526328 0.0000000
## hierarchichal clustering:
<- stats::hclust(as.dist(distmat), method = "complete") setRR_hcl
The source file to the vignette details how to produce the exact plot
based on the cluster object setRR_hcl
:
Cutting the hierarchical tree to obtain four clusters leads to the colored clusters and the vertical black lines you see in the figure.
The simplest method to aggregating, or summarizing, a set of snow profiles is computing the medoid profile of the set. That medoid profile is the profile with the smallest accumulated distance to all profiles in that set. In other words, it’s the profile closest to the geometric center of the set.
The function medoidSP
can be used to conveniently
compute the pairwise distance matrix among all profiles (as done above),
and to elicit the index of the medoid profile from that distance matrix.
The medoid profile of the first (blue) cluster, for example, can be
found with
unname(medoidSP(distmat = distmat[1:4, 1:4]))
#> [1] 2
We can visualize the pairwise distances among all profiles in the set with a configuration plot that is based on multidimensional scaling (MDS). That nicely illustrates why a certain profile is supposed to be the medoid profile.
<- smacof::mds(as.dist(distmat), type = "ordinal") fit
Such an MDS fit can be applied to the whole set, or to individual clusters, and can look similar to the following: