This example shows how to use and interpret different absorbing Markov chain metrics with a perfect maze. A perfect maze is a maze that has no loops, every point is reachable, and there is only a single path between any two points. The motivation for this type of example is that is fun and interesting while having very simple properties that make it easy to visually and numerically interpret the results of the different metrics available in the package.
The complete code for this series is available on Github.
library(samc)
library(raster)
library(gdistance)
library(viridisLite)
Create a function to reduce redundant plotting code and make the examples easier to read:
<- function(map, title, colors) {
plot_maze # start = 1 (top left), finish = last element (bottom right)
<- xyFromCell(map, c(1, length(map)))
sf
plot(map, main = title, col = colors, axes = FALSE, box = FALSE, asp = 1)
plot(rasterToPolygons(map), border = 'black', lwd = 1, add = TRUE)
points(sf, pch = c('S', 'F'), cex = 1, font = 2)
}
Create a simple color palette for when only one or two colors are needed in a figure:
# A simple color palette with 2 colors
<- viridis(3)[2:3] vir_col
Setup up a resistance map representing the maze and use the gdistance
package to get information about the solution (the shortest path). Assume the start is the top left and the finish is the bottom right:
= matrix(
maze_res c(1,0,0,0,0,1,1,1,1,1,1,0,1,0,0,0,0,1,0,1,
1,1,1,1,0,1,0,1,0,0,1,1,1,0,1,1,0,1,1,1,
1,0,0,1,0,1,0,1,1,0,1,0,1,0,0,1,0,0,1,0,
1,1,0,0,0,1,0,0,0,0,0,0,1,1,0,1,0,1,1,1,
0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,0,0,0,1,
1,1,0,1,0,0,0,1,0,1,0,0,1,0,1,1,1,1,0,1,
0,0,0,0,0,1,0,0,0,1,1,0,1,1,1,0,0,1,0,1,
1,1,0,1,1,1,1,1,0,0,1,0,0,1,0,0,1,1,1,1,
0,1,0,1,0,0,0,1,1,1,1,1,0,1,1,0,1,0,1,0,
1,1,0,1,0,1,0,0,0,1,0,0,0,0,0,0,1,0,1,1,
0,1,1,1,1,1,1,1,0,1,1,1,0,1,0,1,1,0,0,1,
1,1,0,0,1,0,0,1,0,1,0,1,1,1,0,1,0,0,1,1,
0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,
0,0,0,0,1,0,0,1,1,1,0,1,0,1,1,1,0,0,1,1,
1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,0,0,1,1,0,
1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,1,
1,1,1,1,0,1,0,0,0,1,1,0,1,1,0,1,1,1,1,0,
0,1,0,1,0,1,1,1,0,0,0,0,0,1,0,1,0,1,0,0,
0,0,0,1,0,0,0,1,1,1,1,1,1,1,0,0,0,1,1,1,
1,1,1,1,1,1,1,1,0,1,0,1,0,1,1,1,1,1,0,1),
nrow = 20, byrow = TRUE
)
<- raster(maze_res, xmn = 0.5, xmx = ncol(maze_res) + 0.5, ymn = 0.5, ymx = nrow(maze_res) + 0.5)
maze_res ==0] <- NA # 0 makes the formatting cleaner above, but NA is needed for true barriers
maze_res[maze_res
# Get info about the shortest path through the maze using gdistance
<- (function() {
lcd <- xyFromCell(maze_res, c(1, 400))
points
<- transition(maze_res, function(x) 1/mean(x), 4)
tr <- geoCorrection(tr)
tr
list(dist = costDistance(tr, points),
path = shortestPath(tr, points[1, ], points[2, ], output="SpatialLines"))
})()
# Basic maze layout
plot_maze(maze_res, "Resistance", vir_col[1])
lines(lcd$path, col = vir_col[2], lw = 3)
Create an absorption map where the finish point is the only source of absorption. It will have an absorption value of 1.0
, which means that once this point is entered, it cannot be left for another cell in the maze:
# End of maze
<- maze_res * 0
maze_finish 20, 20] <- 1
maze_finish[
plot_maze(maze_finish, "Absorption", vir_col)
Given the nature of some of the calculations performed by the package, it’s sometimes possible for floating point precision issues to arise where equality comparisons between two decimal values don’t yield expected results. It’s an unfortunate limitation of all computer hardware and is why functions like all.equal()
exist in base R. Unfortunately, there isn’t a base function available for the specific comparisons we want to perform, so we will manually have to do it. In order to do so, we will need a tolerance value and will use the default tolerance used by all.equal()
:
= sqrt(.Machine$double.eps) # Default tolerance in functions like all.equal()
tolerance
print(tolerance)
#> [1] 1.490116e-08
samc
objectWith the resistance and absorption maps prepared, a samc object can be created:
<- list(fun = function(x) 1/mean(x), dir = 4, sym = TRUE)
tr
<- samc(maze_res, maze_finish, tr_args = tr)
maze_samc
<- locate(maze_samc, data.frame(x = 1, y = 20))
maze_origin <- locate(maze_samc, data.frame(x = 20, y = 1)) maze_dest
The model represented by this samc object has been setup to assume a simple random walk. There is no “memory” of the past to stop the individual from going back to dead ends, there is no ability to “look ahead” and see dead ends, and the individual will always move to a different cell every time step.
Note the use of only four directions. This prevents diagonal movements in the maze, which in turn will have important consequences for short-term metrics (discussed later).
The start and finish locations are obtained from the samc object using the locate()
function. It’s important to remember that the results from xyFromCell()
(used above for the RasterLayer object) do NOT work for samc objects. Although the results from both functions may be equivalent in certain cases, they generally will not be equivalent and cannot be used interchangeably.
We will start with determining how long, on average, it would be expected for a single individual to finish the maze. Formally, the survival()
function calculates the expected time to absorption, but since the maze has only one absorption point representing the finish point, this means that, in this example, survival()
can be interpreted as the expected time to finish the maze:
<- survival(maze_samc)
maze_surv
plot_maze(map(maze_samc, maze_surv), "Expected time to finish", viridis(256))
The result is a vector with the expected time to absorption, or finish, for every point in the maze, not just the top left start point. The start location can be used to extract that information from the vector:
maze_surv[maze_origin]#> [1] 13869
When there is only a single point of total absorption, like this example, survival()
and cond_passage()
are nearly identical when the destination for cond_passage()
is set to the point of absorption (the finish point):
<- cond_passage(maze_samc, dest = maze_dest)
maze_cond
maze_cond[maze_origin]#> 1
#> 13868
It turns out that the cell in the map corresponding to our finish point is not actually an absorbing state; it’s still a transient state. Absorption occurs one time step later when the individual is removed from the maze since it can’t go anywhere else in our example. So cond_passage()
reports how long it takes to reach the final cell, and survival()
tells us how long it takes us to reach the final cell and then be removed from it.
A couple things to keep in mind: - survival()
will have different results and a different interpretation for the maze in more complicated scenarios with multiple points of absorption. - cond_passage()
normally does not work when there are states that lead to total absorption. The only exception is when there is a single instance of a state leading to total absorption present and the dest
parameter is set to it. In that case, the absorption state is actually effectively ignored. This is why cond_passage()
works as shown in this example, but if attempting to change the dest
parameter to a different location will cause it to fail. cond_passage()
can still be used for other dest
values, but first the absorption value would have to be changed to a value less than 1
.
If we’re interested in knowing the probability that a particular cell (or cells) is visited at least once, then we can use the dispersal()
function:
<- dispersal(maze_samc, origin = maze_origin)
maze_disp #>
#> Cached diagonal not found.
#> Performing setup. This can take several minutes... Complete.
#> Calculating matrix inverse diagonal...
#>
: 100% (done)
Computing#>
Complete #> Diagonal has been cached. Continuing with metric calculation...
plot_maze(map(maze_samc, maze_disp), "Probability of Visit", viridis(256))
In order to complete the maze, the individual has to visit every cell along the path to the exit, so all of those cells will have a probability of 1.0
. The farther away from this path a cell is located, the lower the probability it will be visited. Additionally, the closer the individual gets to the finish, the less likely they will spend time taking incorrect routes before stumbling upon the finish. There’s also the possibility for scenarios like the one where the individual is near the finish and then manages to stumble all the way back to the start. These different aspects contribute to more time spent near the start point, on average, which in turn means that cells near the start have a higher probability of being visited.
The previous paragraph hinted that the results from the dispersal()
function can be used to identify the route through the maze:
# Ideally would use `as.numeric(maze_disp == 1)`, but floating point precision issues force an approximation
<- as.numeric(abs(maze_disp - 1) < tolerance)
maze_disp_sol
plot_maze(map(maze_samc, maze_disp_sol), "Solution Using Dispersal()", vir_col)
Like the survival()
function, there is an important caveat: The visitation probabilities don’t include time 0
. So while it looks like the starting cell might have a probability of 1
in the first figure, it’s actually slightly less and it just wasn’t apparent with the color scale:
maze_disp[maze_origin]#> 1
#> 0.9864865
The package can be used to see how many times each cell in the maze is expected to be visited on average. This is done using the visitation()
metric:
<- visitation(maze_samc, origin = maze_origin)
maze_visit
plot_maze(map(maze_samc, maze_visit), "Visits Per Cell", viridis(256))
Since the finish point leads to total absorption, it will only be visited once:
maze_visit[maze_dest]#> [1] 1
Using the distribution()
function, the location of an individual in the maze can be predicted for a given time:
<- distribution(maze_samc, origin = maze_origin, time = 20)
maze_dist
plot_maze(map(maze_samc, maze_dist), "Location at t=20", col = viridis(256))
There’s an odd pattern here that can be emphasized by incrementing the time from 20 to 21:
<- distribution(maze_samc, origin = maze_origin, time = 21)
maze_dist
plot_maze(map(maze_samc, maze_dist), "Location at t=21", viridis(256))
Earlier it was mentioned that using only 4 directions for the transition function would have consequences. Without either diagonal transitions or some form of fidelity (the possibility of an individual not moving), there will be an alternating pattern where a cell can and then cannot be occupied. It’s easiest to reason about the effect by running the same function for time steps 1-5 and visualizing it, an exercise that will be left to those interested.
There are a couple metrics that have not been covered yet: absorption()
and mortality()
. For the example provide thus far, they don’t do anything useful. mortality()
would only report a 100% probability of absorption at the finish point; it will be more useful when there are multiple locations with absorption probabilities. absorption()
is only useful when there are different types of absorption.
Many of the metrics offer a more advanced and flexible option for setting an initial state in the absorbing Markov chain. Inputs to the occ
parameter (short for occupancy) can essentially be used as an alternative to specifying a singular start point. Here is an example occ
map that represents the scenario that has been presented so far:
<- maze_res * 0
maze_occ 1, 1] <- 1
maze_occ[
plot_maze(maze_occ, "Occupancy", vir_col)
Using the survival()
metric, it produces the same result as before:
survival(maze_samc, occ = maze_occ)
#> [1] 13869
maze_surv[maze_origin]#> [1] 13869
But now,more interesting scenarios can be tested. For example, let’s start the maze with 3 individuals:
# Scenario 1: 3 people start in the maze
<- maze_res * 0
maze_occ3 1, 1] <- 3
maze_occ3[
survival(maze_samc, occ = maze_occ3)
#> [1] 41607
The interpretation becomes a little trickier here. The survival()
metric is returning the cumulative time spent in the maze for all the individuals. This means that with 3 individuals starting at the beginning, 3 times as much time is going to be spent in the maze, but that per individual, it will still take the same number of time steps to finish as before:
survival(maze_samc, occ = maze_occ3) / 3
#> [1] 13869
Here’s a different scenario with 3 individuals again, but now they start in different corners of the maze:
# Scenario 2: A person starts in each corner of the maze
<- maze_res * 0
maze_occ3 1, 1] <- 1
maze_occ3[20, 1] <- 1
maze_occ3[1, 20] <- 1
maze_occ3[
plot_maze(maze_occ3, "Occupancy", vir_col)
survival(maze_samc, occ = maze_occ3)
#> [1] 21949
Again, the cumulative time spent in the maze is larger than reported in the original scenario with one individual. However, this time the average time per individual spent in the maze decreases:
survival(maze_samc, occ = maze_occ3) / 3
#> [1] 7316.333
This is because the additional individuals are now located in corners that are substantially closer to the finish point, so they would be expected to find it faster on average. Based on where each individual starts, it would be reasonable to expect that the individual starting in the bottom left has the biggest advantage. One way to try and visualize this is with the distrbution()
metric:
<- distribution(maze_samc, occ = maze_occ3, time = 17)
maze_occ3_dist
# This makes it easier to see how far along the individuals could be
<- as.numeric(maze_occ3_dist > 0)
maze_occ3_dist
plot_maze(map(maze_samc, maze_occ3_dist), "Location at t=17", viridis(256))
A good exercise for the reader is to apply the concepts of the animations vignette to this figure.
In future parts, many different modifications will be made to the maze to explore how they change the results of different metrics and how the different metrics are related. These modifications include the use of fidelity to simulate scenarios where the individual pauses to think about their next decision, modifying the resistance input to simulate “looking ahead” to perform dead end avoidance, and modifying the absorption input to simulate different sources of absorption such as lethal traps.