Ellipse Overlap

Andrew L Jackson

2021-04-13

Calculating the Area of Overlap Between Two Ellipses

# remove previously loaded items from the current environment and remove previous graphics.
rm(list=ls())
graphics.off()

# Here, I set the seed each time so that the results are comparable. 
# This is useful as it means that anyone that runs your code, *should*
# get the same results as you, although random number generators change 
# from time to time.
set.seed(1)

# load SIBER
library(SIBER)
library(viridis)

# set a new three-colour palette from the viridis package
palette(viridis::viridis(3))

# load in the included demonstration dataset
data("demo.siber.data")
#
# create the siber object
siber.example <- createSiberObject(demo.siber.data)


# Or if working with your own data read in from a *.csv file, you would use
# This *.csv file is included with this package. To find its location
# type
# fname <- system.file("extdata", "demo.siber.data.csv", package = "SIBER")
# in your command window. You could load it directly by using the
# returned path, or perhaps better, you could navigate to this folder
# and copy this file to a folder of your own choice, and create a 
# script from this vingette to analyse it. This *.csv file provides
# a template for how your own files should be formatted.

# mydata <- read.csv(fname, header=T)
# siber.example <- createSiberObject(mydata)


# Create lists of plotting arguments to be passed onwards to the  
# plotting functions. With p.interval = NULL, these are SEA. NB not SEAc though
# which is what we will base our overlap calculations on. This implementation 
# needs to be added in a future update. For now, the best way to plot SEAc is to
# add the ellipses manually following the vignette on this topic.
group.ellipses.args  <- list(n = 100, p.interval = NULL, lty = 1, lwd = 2)



par(mfrow=c(1,1))
plotSiberObject(siber.example,
                  ax.pad = 2, 
                  hulls = F, community.hulls.args, 
                  ellipses = T, group.ellipses.args,
                  group.hulls = F, group.hull.args,
                  bty = "L",
                  iso.order = c(1,2),
                  xlab = expression({delta}^13*C~'\u2030'),
                  ylab = expression({delta}^15*N~'\u2030')
                  )
#> Warning in title(...): font metrics unknown for Unicode character U+2030

#> Warning in title(...): font metrics unknown for Unicode character U+2030
#> Warning in title(...): conversion failure on '‰' in 'mbcsToSbcs': dot
#> substituted for <e2>
#> Warning in title(...): conversion failure on '‰' in 'mbcsToSbcs': dot
#> substituted for <80>
#> Warning in title(...): conversion failure on '‰' in 'mbcsToSbcs': dot
#> substituted for <b0>
#> Warning in title(...): font metrics unknown for Unicode character U+2030

#> Warning in title(...): font metrics unknown for Unicode character U+2030
#> Warning in title(...): conversion failure on '‰' in 'mbcsToSbcs': dot
#> substituted for <e2>
#> Warning in title(...): conversion failure on '‰' in 'mbcsToSbcs': dot
#> substituted for <80>
#> Warning in title(...): conversion failure on '‰' in 'mbcsToSbcs': dot
#> substituted for <b0>

In order now to calculate the overlap between two (or more) ellipses, we need to know the coordinates of each ellipse. This is done by calling addEllipse(..., do.plot = FALSE). See the associated help file and the vignette Customising-Plots-Manually for more information on optional inputs to addEllipse for different types of ellipse. Also, bear in mind that the option n controls how many points are used to draw the ellipse, and hence low n means clunky, edgier ellipses, compared with rounder, smoother ellipses for higher n. A higher n is more suitable when ellipses are more eccentric as their curvature is greater at the tips. The new functions maxLikOverlap and bayesianOverlap are wrapper functions that take care of the calls to addEllipse and the actual polygon overlap function in the package spatstat.utils. The functions maxLikOverlap and bayesianOverlap return three values each: the computationally calculated area of the first ellipse, second ellipse, and the overlap between them. It is not entirely obvious to me that there is a single choice if you wish to express your overlap as a proportion, since there are several options for the choice of denominator. One can imagine that expressing the overlap as a proportion of the sum of the non-overlapping areas of the ellipses seems suitable in a general sense, since this will range from 0 when the ellipses are completely distinct, to 1 when the ellipses are completely coincidental.

# In this example, I will calculate the overlap between ellipses for groups 2
# and 3 in community 1 (i.e. the green and yellow open circles of data).

# The first ellipse is referenced using a character string representation where 
# in "x.y", "x" is the community, and "y" is the group within that community.
# So in this example: community 1, group 2
ellipse1 <- "1.2" 

# Ellipse two is similarly defined: community 1, group3
ellipse2 <- "1.3"

# The overlap of the maximum likelihood fitted standard ellipses are 
# estimated using
sea.overlap <- maxLikOverlap(ellipse1, ellipse2, siber.example, 
                             p.interval = NULL, n = 100)

# the overlap betweeen the corresponding 95% prediction ellipses is given by:
ellipse95.overlap <- maxLikOverlap(ellipse1, ellipse2, siber.example, 
                             p.interval = 0.95, n = 100)

# so in this case, the overlap as a proportion of the non-overlapping area of 
# the two ellipses, would be
prop.95.over <- ellipse95.overlap[3] / (ellipse95.overlap[2] + 
                                          ellipse95.overlap[1] -
                                          ellipse95.overlap[3])

The function bayesianOverlap returns multiple rows of these three numbers, each representing the values for a particular draw from the posterior estimates so that you can build up a picture of the distribution of the estimated overlap. Calculating this overlap is computationally time consuming, and there are going to be thousands of posterior samples collected in a typical analysis. For this example, I will calculate the posterior overlap on the first 100 samples, but in reality you would probably want to do this on at least a few hundred, if not all your posterior samples in a longer (perhaps over-lunch or over-night) run.

# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^4   # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 10     # thin the posterior by this many
parms$n.chains <- 2        # run this many chains

# define the priors
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3

# fit the ellipses which uses an Inverse Wishart prior
# on the covariance matrix Sigma, and a vague normal prior on the 
# means. Fitting is via the JAGS method.
ellipses.posterior <- siberMVN(siber.example, parms, priors)
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 30
#>    Unobserved stochastic nodes: 3
#>    Total graph size: 45
#> 
#> Initializing model
#> 
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 30
#>    Unobserved stochastic nodes: 3
#>    Total graph size: 45
#> 
#> Initializing model
#> 
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 30
#>    Unobserved stochastic nodes: 3
#>    Total graph size: 45
#> 
#> Initializing model
#> 
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 30
#>    Unobserved stochastic nodes: 3
#>    Total graph size: 45
#> 
#> Initializing model
#> 
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 30
#>    Unobserved stochastic nodes: 3
#>    Total graph size: 45
#> 
#> Initializing model
#> 
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 30
#>    Unobserved stochastic nodes: 3
#>    Total graph size: 45
#> 
#> Initializing model

# and teh corresponding Bayesian estimates for the overlap between the 
# 95% ellipses is given by:
bayes95.overlap <- bayesianOverlap(ellipse1, ellipse2, ellipses.posterior,
                                   draws = 100, p.interval = 0.95, n = 100)

# a histogram of the overlap
hist(bayes95.overlap[,3], 10)


# and as above, you can express this a proportion of the non-overlapping area of 
# the two ellipses, would be
bayes.prop.95.over <- (bayes95.overlap[,3] / (bayes95.overlap[,2] + 
                                                bayes95.overlap[,1] -
                                                bayes95.overlap[,3])
                       )

hist(bayes.prop.95.over, 10)