Using the contact package

Trevor S. Farthing

2021-05-14

Package description and vignette purpose

The contact package is intended to allow for easy processing of spatiotemporal data into contact and social networks, and facilitate network analysis by randomizing individuals’ movement paths and/or related categorical variables. To use this package, users need only have a dataset containing spatial data (i.e., latitude/longitude, or planar xy coordinates), individual IDs relating spatial data to specific individuals, and date/time information relating spatial locations to temporal locations. The functionality of this package ranges from data “cleaning” via multiple filtration functions, to spatial and temporal data interpolation, and network creation and summarization. Functions within this package are not limited to describing interpersonal contacts. Package functions can also identify and quantify “contacts” between individuals and fixed areas (e.g., home ranges, waterbodies, buildings, etc.). As such, this package is an incredibly useful resource for facilitating epidemiological, ecological, ethological and sociological research.

Here we demonstrate how to use contact functions to:

1.) Create a point-location-based environmental-contact network

2.) Create polygon-intersection-based contact networks using point-location data

3.) Test networks against NULL models

Please note that a manuscript detailing novel methodologies we’ve developed for accounting for space and real-time location-system accuracy when creating contact networks from point-location data was recently published in Ecology and Evolution. This same manuscript introduces the contact package to the scientific community. Our methods are described briefly in this vignette, but are presented in much greater detail in our manuscript, which is available at https://doi.org/10.1002/ece3.6225.

#load the contact package
library(contact)

Section 1.) Creating a point-location-based environmental contact network

Here, we show how to create a contact network detailing the number of contacts each tracked individual has with fixed area/points. In this case, our network will represent contacts between calves in a single feedlot pen with their water trough (for which we know the coordinates). The data set we use, calves, is comprised of point-locations collected using a radio-telemetry-based real-time location system (RTLS) (Smartbow GmbH, Weibern, Austria) to monitor the locations of n = 10 steers in a single 30m X 35m feedlot pen located at the Kansas State University Beef Cattle Research Center in Manhattan, KS. Manufacturer-reported RTLS spatial resolution and accuracy was 0.5m and 90%, respectively (i.e., 90% of location fixes are within 0.5m of animals’ true locations). Tracked steers were approximately 1.5 years old, with estimated 1.5-m nose-to-tail lengths and 0.5-m shoulder widths, and radio-frequency-identication (RFID) tracking devices were located on animals’ left ears. Data in this set were collected continuously between 00:00:00 and 02:00:00 UTC on 06/01/2018 at a 5-10s temporal resolution (i.e., positional fixes for each individual were obtained every 5-10 seconds).

Using the calves data set, we identify the number of “contacts” each individual in had with the water trough in their pen. We define “contact” as occuring when point-locations were within a pre-determined spatial-threshold distance (SpTh) of water-trough edges. In this case, we set our initial SpTh as 0.333 m (i.e., the approximate distace from RFID tags to calves’ noses), then re-define this SpTh to account for RTLS accuracy using the findDistThresh function below.

The steps for environmental-contact network creation are described below.

A.) Ensure all required columns exist in the calves data set (i.e., xy coordinates, unique individual IDs, dateTime).

B.) Calculate distances between the water-trough polygon and calves at each time step.

C.) Identify what SpTh value will allow us to capture 99% of contacts, defined as instances when point-locations were within 0.333 m of the water trough, given the RTLS accuracy.

D.) Identify time points when calves were within the re-adjusted SpTh distance from water trough.

E.) Visualize the contact network with edgeweights weighted according to number of observed contacts.


data("calves") #load the calves data set
calves<-droplevels(calves[1:1000,]) #reduce the size to reduce processing time for this vignette.

A.) Ensure all required columns exist in the calves data set (i.e., xy coordinates, unique individual IDs, dateTime)


head(calves)#The calves data set does not have a singular dateTime column. Rather, it has "date" and "time" columns. We must append a dateTime column to the data frame.
#>   calftag     x     y     time       date
#> 1     101 63.38 47.43 00:00:14 05-02-2016
#> 2     101 63.48 46.59 00:00:22 05-02-2016
#> 3     101 62.90 47.07 00:00:26 05-02-2016
#> 4     101 63.27 47.26 00:00:34 05-02-2016
#> 5     101 63.04 46.62 00:00:38 05-02-2016
#> 6     101 63.53 47.24 00:00:42 05-02-2016

system.time(calves.dateTime<- contact::datetime.append(x = calves, date = calves$date, time= calves$time, dateTime = NULL, dateFormat = "mdy", dateFake = FALSE, startYear = NULL, tz.in = "UTC", tz.out = NULL, month = FALSE, day = FALSE, year = FALSE, hour = FALSE, minute = FALSE, second = FALSE, daySecond = FALSE, totalSecond = FALSE))
#>    user  system elapsed 
#>   0.155   0.005   0.160

B.) Calculate distances between the water-trough polygon and calves at each time step.

First, we must define the location of the water trough. To do this, we read in point-location data for the water-trough vertices.


water<- data.frame(x = c(61.43315, 61.89377, 62.37518, 61.82622), y = c(62.44815, 62.73341, 61.93864, 61.67411)) #This is a data frame containing the x and y coordinates of the four trough vertices.

As noted in the dist2Area_df help documention, polygon-vertex coordinates must be arranged in a particular way. Here we arrange them accordingly.


water_poly<-data.frame(matrix(ncol = 8, nrow = 1)) #(ncol = number of vertices)*2
colnum = 0
for(h in 1:nrow(water)){
  water_poly[1,colnum + h] <- water$x[h] #pull the x location for each vertex
  water_poly[1, (colnum + 1 + h)] <- water$y[h] #pull the y location for each vertex
  colnum <- colnum + 1
}

Calculate distances between calves and the water polygon at every timestep.


system.time(water_distance<-contact::dist2Area_df(x = calves.dateTime, y = water_poly, x.id = "calftag", y.id = "water", dateTime = "dateTime", point.x = calves.dateTime$x, point.y = calves.dateTime$y, poly.xy = NULL, parallel = FALSE, dataType = "Point", lonlat = FALSE, numVertices = NULL)) #note that the poly.xy and numVertices arguments refer to vertices of polygons in x, not y. Because dataType is "Point," not "Polygon," these arguments are irrelevant here.
#>    user  system elapsed 
#>   4.200   0.039   4.240

head(water_distance)
#>              dateTime totalIndividuals individualsAtTimestep  id dist.to.water
#> 1 2016-05-02 00:00:14                3                     1 101      14.32860
#> 2 2016-05-02 00:00:22                3                     1 101      15.17450
#> 3 2016-05-02 00:00:26                3                     1 101      14.64353
#> 4 2016-05-02 00:00:34                3                     1 101      14.48624
#> 5 2016-05-02 00:00:38                3                     1 101      15.10296
#> 6 2016-05-02 00:00:42                3                     1 101      14.53432

C.) Identify what SpTh value will allow us to capture 99% of contacts, defined as instances when point-locations were within 0.333 m of the water trough, given the RTLS accuracy.


SpThValues<-contact::findDistThresh(n = 100000, acc.Dist1 = 0.5, acc.Dist2 = NULL, pWithin1 = 90, pWithin2 = NULL, spTh = 0.5) #spTh represents the initially-defined spatial threshold for contact. Note that we've chosen to use 100,000 in-contact point-location pairs here.

SpThValues #it looks like an adjusted SpTh value of approximately 1.74 m will likely capture 99 to 100% of contacts, defined as instances when point-locations were within 0.333 m of the water trough, given the RTLS accuracy. #Note that because these confidence intervals are obtained from distributions generated from random samples, every time this function is run, results will be slightly different. 
#> $distribution.summary
#>        mean          sd         min         max         TPR 
#> 0.706339284 0.345804618 0.004458383 2.523259296 0.305680000 
#> 
#> $CI_upper
#>    50%-CI    55%-CI    60%-CI    65%-CI    70%-CI    75%-CI    80%-CI    85%-CI 
#> 0.7070769 0.7071654 0.7072596 0.7073613 0.7074727 0.7075972 0.7077407 0.7079135 
#>    90%-CI    95%-CI    99%-CI 
#> 0.7081380 0.7084826 0.7091560 
#> 
#> $CI_lwr
#>    50%-CI    55%-CI    60%-CI    65%-CI    70%-CI    75%-CI    80%-CI    85%-CI 
#> 0.7056017 0.7055132 0.7054189 0.7053173 0.7052059 0.7050813 0.7049379 0.7047651 
#>    90%-CI    95%-CI    99%-CI 
#> 0.7045406 0.7041960 0.7035225 
#> 
#> $spTh.adjustments
#>  spTh_84%Capture  spTh_98%Capture spTh_100%Capture 
#>         1.052144         1.397949         1.743753 
#> 
#> $contact.frequency
#>  freq_84%Capture  freq_98%Capture freq_100%Capture 
#>          0.83481          0.96692          0.99616

CI_99<-unname(SpThValues[["spTh.adjustments"]][3]) #we will use this SpTh value moving forward.

D.) Identify time points when calves were within the re-adjusted SpTh distance from water trough.


system.time(water_contacts <- contact::contactDur.area(water_distance, dist.threshold=CI_99,sec.threshold=1, blocking = FALSE, equidistant.time = FALSE, parallel = FALSE, reportParameters = TRUE)) #Note that because we are not interested in making a time-aggregated network with > 1 temporal levels, we set blocking = FALSE to reduce processing time.
#>    user  system elapsed 
#>   0.182   0.035   0.217

head(water_contacts)
#>   indiv.id area.id contact.id contactDuration    contactStartTime
#> 1      101   water  101-water               1 2016-05-02 00:47:31
#> 2      101   water  101-water               1 2016-05-02 00:47:35
#> 3      101   water  101-water               1 2016-05-02 00:47:39
#> 4      101   water  101-water               1 2016-05-02 00:47:43
#> 5      101   water  101-water               1 2016-05-02 00:47:47
#> 6      101   water  101-water               1 2016-05-02 00:47:55
#>        contactEndTime distThreshold secThreshold equidistant.time
#> 1 2016-05-02 00:47:31      1.743753            1            FALSE
#> 2 2016-05-02 00:47:35      1.743753            1            FALSE
#> 3 2016-05-02 00:47:39      1.743753            1            FALSE
#> 4 2016-05-02 00:47:43      1.743753            1            FALSE
#> 5 2016-05-02 00:47:47      1.743753            1            FALSE
#> 6 2016-05-02 00:47:55      1.743753            1            FALSE

E.) Visualize the contact network with edgeweights weighted according to number of observed contacts.


system.time(water_edges<- contact::ntwrkEdges(x = water_contacts, importBlocks = FALSE, removeDuplicates = TRUE)) #get specific weighted edges
#>    user  system elapsed 
#>   0.019   0.006   0.024

head(water_edges)
#>   from    to durations
#> 1  101 water        18
#> 2  102 water        49

Now we can vizualize the network using igraph functions.


water.network <- igraph::simplify(igraph::graph_from_data_frame(d=water_edges, directed=F, vertices =  c(seq(101,110), "water")),remove.multiple = T, remove.loops = T) #Note that we have to specify the nodes here because not all calves were observed in contact with the water trough.
igraph::V(water.network)$color<- "orange1" #make calf nodes orange
igraph::V(water.network)$color[length(igraph::V(water.network))]<- "steelblue1" #make water node blue
igraph::V(water.network)$label<-NA #no need to label nodes
igraph::V(water.network)$size <-13
igraph::V(water.network)$shape<-c(rep("circle", (length(igraph::V(water.network)) - 1)), "square") #make the calf nodes circular and the water node square
igraph::E(water.network)$width <- water_edges$duration/10 #edge width is proportional to contact frequency
igraph::E(water.network)$color <- "black" #make edges black
watercoords1<- igraph::layout_as_star(water.network, center = igraph::V(water.network)[length(igraph::V(water.network))]) #set the center of the star layout as the water polygon
igraph::plot.igraph(water.network, layout = watercoords1)

### Section 2.) Creating polygon-intersection-based direct-contacts network using point-location data

Note that this section was run with eval = FALSE because otherwise the processing time for this vignette is longer than what is allotted by CRAN

Here, we show how to create contact networks detailing the number of direct, physical contacts between tracked individuals. We derive polygons represntative of animals’ physical space from point-locations. We then define “contact” as occuring when polygons intersect (i.e., SpTh == 0). However, we re-define this SpTh to account for RTLS accuracy using the findDistThresh function below.

The data set we use, calves2018, is comprised of point-locations collected using a radio-telemetry-based real-time location system (RTLS) (Smartbow GmbH, Weibern, Austria) to monitor the locations of n = 20 steers in a 30m X 35m feedlot pen located at the Kansas State University Beef Cattle Research Center in Manhattan, KS. Manufacturer-reported RTLS spatial resolution and accuracy was 0.5m and 90%, respectively (i.e., 90% of location fixes are within 0.5m of animals’ true locations). Tracked steers were approximately 1.5 years old, with estimated 1.5-m nose-to-tail lengths and 0.5-m shoulder widths, and radio-frequency-identication (RFID) tracking devices were located on animals’ left ears. Data in this set were collected continuously between 00:00:00 06/01/2018 and 11:59:59 06/03/2018 UTC at a 5-10s temporal resolution (i.e., positional fixes for each individual were obtained every 5-10 seconds). However, to reduce processing time in this vignette, we subset the calves2018 data set to only contain point-locations observed between 00:00:00 and 11:59:59 UTC on 06/01/2018.

The steps for direct-contact-network creation are described below.

A.) Subset calves2018 and ensure all required columns exist in the calves data set (i.e., xy coordinates, unique individual IDs, dateTime).

B.) Clean and filter the data set.

C.) Temporally smooth point-locations, ensuring that animals are observed at identical timepoints (Note: this was an unnecessary step in Section 1 because the water trough’s location was fixed and unchanging over time).

D.) Derive calf head and body polygons from point locations.

Note: This is done using a novel methodology described in brief on the contact::referencePoint2Polygon and contact::referencePoint2Polygon help pages, and in detail in Farthing et al. 2020 (the same manuscript in which we introduce this package to the scientific community).

E.) Calculate inter-animal distances at each time step.

F.) Identify what SpTh value will allow us to capture 99% of contacts, polygon intersections, given the RTLS accuracy.

G.) Identify time points when calf heads were within the re-adjusted SpTh distance from one another or body polygons.

H.) Visualize the contact networks with edgeweights weighted according to number of observed contacts.

data("calves2018") #load the data set

A.) Subset calves2018 and ensure all required columns exist in the calves data set (i.e., xy coordinates, unique individual IDs, dateTime)


head(calves2018) #see that all necessary columns are there.

We will only look at points from the 1st date represented in the data set (06/01/2018). Therefore, we need to get the unique date values in the data.


calves2018$date<-lubridate::date(calves2018$dateTime) #add the date column to the data set

calves06012018 <- droplevels(subset(calves2018, date == unique(calves2018$date)[1])) #pull the observations from 06/01/2018

B.) Clean and filter the data set.

Here we run the various spatiotemporal-data-filtering functions offered by the package (i.e., mps, confine, dup). We run them in this particular order because one filtration process may trigger others to remove points that are not necessariliy erroneous. For example, removing duplicated point-locations could create a situation where it appears that individuals moved at a speed exceeding a specified mps threshold. Thus, in this scenario, subsequently running the mps function may remove points that would not have been flagged for removal prior to removing duplicated points.

First, we ensure that points do not represent impossible/highly unlikely movement speeds.


system.time(calves_filter1 <- contact::mps(calves06012018, id = calves06012018$calftag, point.x = calves06012018$x, point.y = calves06012018$y, dateTime = calves06012018$dateTime, mpsThreshold = 10, lonlat = FALSE, parallel = FALSE, filterOutput = TRUE)) #we assume that if calves' point-locations suggest they moved faster than 10m/s, points are erroneous and should be removed. #This did not remove any observations.

Now we want to ensure that all points are within the specific feedlot pen boundaries. As calves could not escape fedlot pens, points outside the pen or adjacent feed bunks are erroneous.


confinementCoords <- data.frame(object = c("feed", "feed", "feed", "feed","fence", "fence", "fence", "fence", "fence"), x = c(46.0118, 46.6482, 58.3415, 57.6507, 60.5775, 29.3054, 16.7602, 17.0590, 46.0309), y = c(197.0570, 197.4131, 175.9618, 175.6284, 170.4628, 153.6002, 176.5861, 181.6315, 197.1261)) #these are the x & y coordinates for the feedlot-pen fenceline and adjacent feedbunk vertices. Note that they are arranged in a clockwise order, as the confine function below requires input vertices to be ordered clockwise or counter-clockwise.

plot(confinementCoords$x,confinementCoords$y,lines(c(confinementCoords$x, confinementCoords$x[1]),c(confinementCoords$y, confinementCoords$y[1])), pch=16, main = "confinement polygon") #this is what the pen outline looks like. 

system.time(calves_filter2<-contact::confine(calves_filter1, point.x = calves_filter1$x, point.y = calves_filter1$y, confinementCoord.x = confinementCoords$x, confinementCoord.y = confinementCoords$y, filterOutput = TRUE)) #this removed an additional 1784 observations

Finally, we want to remove duplicate observations. This is an important step because many data-processing functions below require that no duplicates exist.


system.time(calves_filter3<- contact::dup(calves_filter2, id = calves_filter2$calftag, point.x = calves_filter2$x, point.y = calves_filter2$y, dateTime = calves_filter2$dateTime, avg = FALSE, parallel = FALSE, filterOutput = TRUE)) #it looks like there were no duplicates to remove in the first place. We're ready to proceed with analyses.

C.) Temporally smooth point-locations, ensuring that animals are observed at identical timepoints

Note: This was an unnecessary step in Section 1 because the water trough’s location was fixed and unchanging over time.


#create our data set that shows calves average position every 10 seconds
system.time(calves.10secSmoothed <- contact::tempAggregate(x = calves_filter3, id = calves_filter3$calftag, point.x = calves_filter3$x, point.y = calves_filter3$y, dateTime = calves_filter3$dateTime, secondAgg = 10, extrapolate.left = FALSE, resolutionLevel = "reduced", extrapolate.right = FALSE, na.rm = TRUE, smooth.type = 2)) 

Note that we set na.rm to TRUE and resolutionLevel to “reduced” here. This means that temporal intervals between observed relocations may not necessarily be equidistant, as when individuals were not observed at a given time point, they are removed from the data set (rather than introducing an NA instead). This speeds up later processing functions (because the size of calves.10secSmoothed is relatively small). Please note, however, that having equidistant temporal intervals between observed relocations is a requirement for some contact-function processes (e.g., a variation of the path-randomization procedure described by Spiegel et al. 2016 that can be implemented using contact::randomizePaths). Thus, before these specific processes could be run, the data set must be re-processed with na.rm == FALSE.

D.) Derive calf head and body polygons from point locations.

We could transform empirical point-locations into polygon vertices one vertex at a time by repeatedly using the repositionReferencePoint function, but the referencePoint2Polygon function provides an expediant shortcut for generating square/rectangular polygons from point-locations. Moving forward, we assume that calves’ body sizes are equivalent and stable, and that 0.333 m X 0.333 and 1.167 m X 0.5 m polygons accurately represent areas where calves’ heads and bodies exist, respectively.


##Create 0.333 m X 0.333 m calf head polygons.
#Note that this is done using the original reference points, which denote the locations of RFID tags on individuals' left ears.
system.time(calf_heads <- contact::referencePoint2Polygon(x = calves.10secSmoothed, id = calves.10secSmoothed$id, dateTime = calves.10secSmoothed$dateTime, point.x = calves.10secSmoothed$x, point.y = calves.10secSmoothed$y, direction = NULL, StartLocation = "DL", UpDownRepositionLen = 0.333, LeftRightRepositionLen = 0.333, CenterPoint = FALSE, MidPoints = FALSE, immobThreshold = 0, parallel = FALSE, modelOrientation = 90)) #Note that we do not specify an immobility threshold here. This is because the head polygons will later be unioned with body polygons generated from different point-locations, and prematurely thresholding them will potentially cause misalignment between the two polygons.

head(calf_heads) 

When creating the head polygons, we set direction == NULL because we do not have gyroscopic data detailing movement directions associated with each relocation event. Thus, polygons are subject to the assumptions pertaining to deriving polygons from point-locations as described by Farthing et al. (2020). (see the contact::referencePoint2Polygon help page for a brief description of these assumptions).

Note that the referencePoint2Polygon function assumes the input point-locations represent a single vertex of desired polygons. Because calves’ heads are not the same width as their bodies (i.e., 1.167 m X 0.5 m ; L X W), and are assumed to be centered at the front of the body, however, we must move reference points (on the left ear) to the left by 0.0835 m to reposition them at the upper-left corner of calves bodies before creating body polygons. Additionally, note that we are assuming ears are parallel to shoulder-tips.


system.time(leftShoulder.point<-contact::repositionReferencePoint(x = calves.10secSmoothed, id = calves.10secSmoothed$id, dateTime = calves.10secSmoothed$dateTime, point.x = calves.10secSmoothed$x, point.y = calves.10secSmoothed$y, direction = NULL, repositionAngle = 180, repositionDist = 0.0835, immobThreshold = 0, parallel = FALSE, modelOrientation = 90))  #Again, see that we do not specify a immobility threshold here.

Now we have the left-shoulder points, but before we create the bodies, let’s take a moment to examine these new points more closely.


head(leftShoulder.point) 

See that moving the points changed the calculated dx,dy values from those associated with empirical data points. Therefore, movement angles calculated from these data points will differ from values calculated using the empirical observations. Because of that, when creating body polygons, we must specify that the function use angle values from the empirical data set. Luckily, the function allows for this very easily.

Additionally, the differing calculated-distance values will potentially cause misalignment errors when unioning the head and body polygons if either is created with a specified immobility threshold (see referencePoint2Polygon help page). As outlined by Farthing et al. (2020), in the absence of accompanying gyroscopic data, movement angles are calculated based on observed relocation events with the assumption that all individuals are forward-facing when moving. Thus, specifying an immobility threshold (i.e., observed relocation distance required for us to state that movement actually occurred) is necessary to reduce directional errors by discounting observed movements so miniscule that the majority of the modeled physical-space is likely unaffected (e.g., head shaking), and therefore would not be repositioned. Thus, prior to unioning the polygons, we will not implement an immobility threshold.

Moving forward, let’s generate the vertices for calves’ anterior- and posterior-body polygons (i.e., we’re dividing body polygons into two halves). Rather than running the referencePoint2Polygon function twice, we instead set MidPoints = TRUE, which will effectively identify vertices for the bottom of anterior body sections/top of posterior ones.


system.time(calf_bods <- contact::referencePoint2Polygon(x = leftShoulder.point, id = leftShoulder.point$id, dateTime = leftShoulder.point$dateTime, point.x = leftShoulder.point$x.adjusted, point.y = leftShoulder.point$y.adjusted, direction = leftShoulder.point$movementDirection, StartLocation = "UL", UpDownRepositionLen = 1.167, LeftRightRepositionLen = 0.5, CenterPoint = FALSE, MidPoints = TRUE, immobThreshold = 0, parallel = FALSE, modelOrientation = 90)) #note that direction == leftShoulder.point$movementDirection. 

head(calf_bods) #notice the additional columns compared to calf_heads

Now we can take vertices from calf_heads and calf_bods, and create a vertex set {V(it)} delineating the calves full bodies (i.e., we essentially union the calf_heads and calf_bod polygons). Note that in this calf_FullBody data frame, vertex1 is located on calves left shoulders, and vertices are ordered in a clockwise direction from that point.


calf_FullBody <- data.frame(calf_id = calf_bods$id, vertex1.x = calf_bods$cornerPoint1.x, vertex1.y = calf_bods$cornerPoint1.y, vertex2.x = calf_heads$cornerPoint1.x, vertex2.y = calf_heads$cornerPoint1.y, vertex3.x = calf_heads$cornerPoint2.x, vertex3.y = calf_heads$cornerPoint2.y, vertex4.x = calf_heads$cornerPoint3.x, vertex4.y = calf_heads$cornerPoint3.y, vertex5.x = calf_heads$cornerPoint4.x, vertex5.y = calf_heads$cornerPoint4.y, vertex6.x = calf_bods$cornerPoint2.x, vertex6.y = calf_bods$cornerPoint2.y,  vertex7.x = calf_bods$midPoint2.x, vertex7.y = calf_bods$midPoint2.y, vertex8.x = calf_bods$cornerPoint3.x, vertex8.y = calf_bods$cornerPoint3.y, vertex9.x = calf_bods$cornerPoint4.x, vertex9.y = calf_bods$cornerPoint4.y, vertex10.x = calf_bods$midPoint4.x, vertex10.y = calf_bods$midPoint4.y, dateTime = calf_bods$dateTime)
                            
head(calf_FullBody)

This is an example of what a calf full-body polygon looks like.


fullBodyExample <- data.frame(section = c("body", rep("head", 4), rep("body", 5)), x = unname(unlist(calf_FullBody[10,c(seq(2,21, by =2))])), y = unname(unlist(calf_FullBody[10,c(seq(3,21, by =2))])))

{plot(fullBodyExample$x,fullBodyExample$y, col = as.factor(fullBodyExample$section), lines(c(fullBodyExample$x, fullBodyExample$x[1]),c(fullBodyExample$y, fullBodyExample$y[1])), pch=16, main = "Calves' body shape")
 legend(39.2, 193.8, col = c(1,2), legend = c("body", "head"), cex = 0.7, pch = 16)}

Recall that these polygons were created without specifying immobility threshold values.

For reference, we show an example of a misalignment that may occur when enforcing an immobility threshold of 0.1m for head and body polygons prior to unioning them. Note that this is the same polygon that was shown above (i.e., same individual at the same time step).

So, even when polygons are ultimately derived from the same empirical data, it’s clear that repositioning the reference point-locations to create different polygons does not allow for the implementation of immobility thresholds prior to unioning procedures.

We are still interested in reducing directional errors by implementing this threshold, however. We can do that now with a simple for-loop.


immobility.vec<- (which(leftShoulder.point$dist.original < 0.1) + 1) #create a vector describing which polygons in calf_heads2 moved < 0.1m. Note that we add 1 to returned values because leftShoulder.point$dist.original details the distance to the successive point, but we want distance to the proceeding point.

calf_FullBody.immob<-calf_FullBody
calf_bods.matrix<-as.matrix(calf_FullBody[,2:21]) #convert vertex columns to matrix for speed

for(l in 1:length(immobility.vec)){ 
  calf_bods.matrix[immobility.vec[l],] <- calf_bods.matrix[(immobility.vec[l]-1),] #replace l-th observations in calf_FullBody with the preceding coordinates.
}

calf_FullBody.immob[,2:21]<- calf_bods.matrix #replace with the updated locations.

Now that we have defined the (x,y) coordinates for the full-body vertex set {V(it)}, we can use these polygons to generate contact networks. Before we move on, however, we will briefly mention another aspect of polygon-derivation that may also introduce movement-orientation error into the data set. As we note on the referencePoint2Polygon help page, in the absence of accompanying gyroscopic-movement data, under normal circumstances we would consider removing observations from our vertex set that represent movements outside of a pre-determined dt (i.e., length of time between consecutive relocation events) threshold. This would be done to ensure that relocation intervals are sufficiently small to minimize chances that animals face unknown directions between observed relocations. For the purposes of this vignette, however, we will assume that the dt-introduced errors are irrelevant.

E.) Calculate inter-animal distances at each time step.

The first step to network creation is calculating the distance between calves at each time step.

Note: the dist2All_df does not allow NA values in polygon coordinates (recall that the polygon derivation process introduced some NAs). Before running this function, we must remove these NA coordinates.


naObs<-which(is.na(calf_FullBody.immob$vertex10.x) == T) #by identifying where NAs were introduced for any body vertex (other than vertex 1, which was left-shoulder point used to generate other vertices), we can determine what rows to remove. Note: we use a body vertex because two NAs were introduced (i.e., one from left-shoulder repositioning and another from polygon creation), as opposed to only one.

FullBody_noNAs<-droplevels(calf_FullBody.immob[-naObs,]) #remove NA coordinates

Now we can proceed with calculating inter-animal distances.


system.time(fullbody_distances<-contact::dist2All_df(x = FullBody_noNAs, id = FullBody_noNAs$calf_id, dateTime = FullBody_noNAs$dateTime, point.x = NULL, point.y = NULL, poly.xy = FullBody_noNAs[,2:21], elev = NULL, parallel = FALSE, dataType = "Polygon", lonlat = FALSE, numVertices = 10)) #these are the distances from the nearest polygon edges (i.e., not distance from centroids).

head(fullbody_distances) #note that if individuals were not observed in the data at a specific time, the function reports NAs for their respective distances.

F.) Identify what SpTh value will allow us to capture 99-percent of contacts, polygon intersections, given the RTLS accuracy.

We initially define “contact” as occurring when two polygons intersect (i.e., SpTh = 0) on any given time step. As we did in Section 1, however, we want to account for RTLS positional accuracy by sampling from a multivariate distribution. Thus, here we seek to determine what SpTh value will likely capture 99-percent of contacts, defined as polygon intersections, and re-define “contact” accordingly.


polySpThValues<-contact::findDistThresh(n = 100000, acc.Dist1 = 0.5, acc.Dist2 = NULL, pWithin1 = 90, pWithin2 = NULL, spTh = 0) #spTh represents the initially-defined spatial threshold for contact. #spTh represents the initially-defined spatial threshold for contact. Note that we've chosen to use 100,000 in-contact point-location pairs here.

polySpThValues #it looks like an adjusted SpTh value of approximately 1.38 m will likely capture 99% of contacts, defined as instances when point-locations were within 0 m of one another, given the RTLS accuracy. #Note that because these confidence intervals are obtained from distributions generated from random samples, every time this function is run, results will be slightly different. 

polyCI_99<-unname(polySpThValues[["spTh.adjustments"]][3]) #we will use this SpTh value moving forward.

G.) Identify time points when calf heads were within the re-adjusted SpTh distance from one another or body polygons.


system.time(calf_fullBody_contacts <- contact::contactDur.all(fullbody_distances,dist.threshold=polyCI_99,sec.threshold=10, blocking = FALSE, equidistant.time = FALSE, parallel = FALSE, reportParameters = TRUE)) #Note that because we are not interested in making a time-aggregated network with > 1 temporal levels, we set blocking = FALSE to reduce processing time.

H.) Visualize the contact networks with edgeweights weighted according to number of observed contacts.


#Generate a static network edge set (aggregated to the day resolution) 
system.time(fullBody_edges<- contact::ntwrkEdges(x = calf_fullBody_contacts, importBlocks = FALSE, removeDuplicates = TRUE))

#visualize the network
fullBody.network <- igraph::simplify(igraph::graph_from_data_frame(d=fullBody_edges, directed=F),remove.multiple = T, remove.loops = T) #create network

igraph::V(fullBody.network)$color<- "orange1"
igraph::V(fullBody.network)$size <-13
igraph::E(fullBody.network)$width <- fullBody_edges$duration/50 #edge width is proportional to contact frequency
igraph::E(fullBody.network)$color <- "black"
igraph::plot.igraph(fullBody.network, vertex.label.cex=0.4, layout = igraph::layout.circle) 

Section 3.) Test networks against NULL models

Note that this section was run with eval = FALSE because otherwise the processing time for this vignette is longer than what is allotted by CRAN

Farine (2017) provides a general overview of the importance of NULL-model creation and evaluation. Briefly, comparing empirical contact networks to NULL models, allows us to determine if observed contacts occur more or less frequently than would be expected at random. This allows us to test a number of hypotheses. For example, by comparing contact-networks derived from tracked sleepy lizard (Tiliqua rugosa) point-locations, Spiegel et al. (2016) determined that this species exhibits sociality. The point-location randomization procedure they developed for NULL-model creation accounts for environmental drivers of movement, suggesting that any observed differences in contact frequency (relative to NULL models) are driven by social behaviors. This point-location randomization procedure can be implemented through the randomizePaths function.

Here, using previously-loaded/created data, we demonstrate how to create NULL contact-network models and how to statistically test network models against one another using contact functions. Specifically, we determine if the total hourly number of full-body contacts occur more or less frequently than would be expected at random in the calves06012018 data set. Furthermore, we use a Mantel test (Mantel 1967) to assess if head X head and full-body contact networks are related to one another.

The steps for NULL-model creation and evaluation are described below.

A.) Generate the empirical hourly head X head and full-body contact networks.

B.) Temporally smooth previously-filtered calves2018 point-locations, ensuring that animals are observed at identical timepoints AND that all timepoints are equidistant (Recall that equidistant timepoints are a requirement for using contact::randomizePaths to randomize point-locations according to the Spiegel et al (2016) method variant we will use below).

C.) Randomize the newly-created point-location data set.

D.) Generate contact networks from randomized data (NULL models).

E.) Test empirical networks against NULL models.

F.) Test empirical networks against each other.

Recall the data sets created in Section 2 that we will continue using.


head(calves06012018) #point-location data set to be temporally smoothed
head(FullBody_noNAs) #polygon data set
head(fullbody_distances) #distances between each full-body polygon at each timestep
polyCI_99 #adjusted polygon-contact SpTh value that likely captures 99% of of inter-calf contacts, defined as instances when calf polygons intersect, given the RTLS accuracy.

A.) Generate the hourly head X head and full-body contact networks.

Here we create a time-aggregated contact network describing head X head contacts and full-body contacts (i.e., body polygons unioned with head polygons) between calves each hour of 06/01/2018.


system.time(calf_fullBody_contacts.hr <- contact::contactDur.all(fullbody_distances,dist.threshold=polyCI_99,sec.threshold=10, blocking = TRUE, blockUnit = "hours", blockLength = 1, equidistant.time = FALSE, parallel = FALSE, reportParameters = TRUE)) #Note that the difference between this edge set and the one created in Section 2 is that blocking is set to TRUE here. 

headVertexColumns<-4:11 #these are the columns within FullBody_noNAs representative of head polygons

system.time(head_distances<-contact::dist2All_df(x = FullBody_noNAs, id = FullBody_noNAs$calf_id, dateTime = FullBody_noNAs$dateTime, point.x = NULL, point.y = NULL, poly.xy = FullBody_noNAs[,headVertexColumns], elev = NULL, parallel = FALSE, dataType = "Polygon", lonlat = FALSE, numVertices = 4)) #Note that the difference between this distance set and the one created in Section 2 is that poly.xy and numVertices arguments refer to head polygons only 

system.time(calf_head_contacts.hr <- contact::contactDur.all(head_distances,dist.threshold=polyCI_99,sec.threshold=10, blocking = TRUE, blockUnit = "hours", blockLength = 1, equidistant.time = FALSE, parallel = FALSE, reportParameters = TRUE)) 

Now that we have our empirical edge set to be analyzed, we need to create our NULL model.

B.) Randomize the newly-created point-location data set.

In accordance with the Spiegel et al. (2016) procedure, individuals’ movement paths (i.e., sequential relocation events) within defined time blocks (1 hour-length blocks here) will be shuffled around in time, so that individuals visit the same places, just not at the same times. For a more detailed description of the relationship between blockUnits and shuffleUnits in the randomizePaths function, please see the randomizePaths function help page. In the interest of relatively quick processing speeds later on, we will only create 2 randomized replicates here.


nRandomizations <- 2 #we will create 2 randomized-hour replicates.

system.time(calf_fullBody.random.list <- contact::randomizePaths(x = FullBody_noNAs, id = FullBody_noNAs$calf_id, dateTime = FullBody_noNAs$dateTime, point.x = NULL, point.y = NULL, poly.xy = FullBody_noNAs[,2:21], parallel = FALSE, dataType = "Polygon", numVertices = 10, blocking = TRUE, blockUnit = "hours", blockLength = 1, shuffle.type = 1, indivPaths = TRUE, numRandomizations = nRandomizations)) #see that we can directly randomize the polygon set. There's no need to take it back to the point data (unless you really want to). See also that we randomize the polygon set with NAs already removed so that we don't have to do it again.

head(data.frame(calf_fullBody.random.list[[1]])) #here's what the output looks like when you set shuffle.type == 1. See that there are separate coordinate columns designated "rand."

C.) Generate contact networks from randomized data (NULL model).

Now we need to put the randomized data through the same network-creation procedures we used for the empirical data. Note that many of the function arguments below now use take column names as inputs, as opposed to vectors of length nrow(x), which we used above. This allows us input list objects as “x” in these functions, with variable values called by other arguments.


fullBodyVertexColnames<- colnames(data.frame(calf_fullBody.random.list[[1]]))[27:46] #these are the column names of columns in the data frames contained within calf_fullBody.random.list that contain randomized full-body-polygon-vertex information.

system.time(fullBody_distances.random<-contact::dist2All_df(x = calf_fullBody.random.list, id = "id", dateTime = "dateTime", point.x = NULL, point.y = NULL, poly.xy = fullBodyVertexColnames, elev = NULL, parallel = FALSE, dataType = "Polygon", lonlat = FALSE, numVertices = 10)) #Note that the difference between this distance set and the one created in Section 2 is that x is a list and other arguments are given column-name information rather than vectors of length(nrow(x)).

Having calculated inter-polygon distances at each time step, we can now create the contact networks from randomized data sets (i.e., NULL models). Note that we continue to use polyCI_99 as the SpTh value. Note that, though the randomized data sets were generated using point-locations where relocations were separated by equidistant timepoints, because we removed polygons with NAs so that the dist2All_df function would work, we can no longer gaurantee that the equidistant-timepoint assumption is valid. Thus, when running the contactDur.all function below, we set the equidistant.time argument to FALSE.


system.time(calf_fullBody_contacts.hr.random <- contact::contactDur.all(x = fullBody_distances.random, dist.threshold = polyCI_99, sec.threshold=10, blocking = TRUE, blockLength = 1, blockUnit = "hours", equidistant.time = FALSE, parallel = FALSE, reportParameters = TRUE)) 

E.) Test empirical networks against NULL model.

Through the contactCompare_chisq function we will use Chi-squared tests to evaluate if differences exist between observed fullbody contacts and expected contacts described in the NULL model. Essentially, we will perform a “goodness-of-fit” test to determine if empirical data fit the distribution describe in the NULL model. Therefore our NULL hypothesis/hypotheses is that the distribution of empirical contacts is equivalent to that NULL-model contacts, and our alternative hypothesis is that a difference exists.

The function takes four data inputs that we must generate before any analyses can be done: x.summary, y.summary, x.potential, and y.potential. The .summary inputs represent summarized edge weights in the empirical and randomized edge sets, respectively. While, the .potential inputs show the maximum number of potential timeBlocks each individual could have been present for over the course of the data-collection period. emp.input and rand.input represent the empirical and NULL netowork edgesets, respectively.


#create x.potential
system.time(x.potential <- contact::potentialDurations(fullbody_distances, blocking = TRUE, blockLength = 1, blockUnit = "hours", distFunction = "dist2All_df")) # see blocking information here

head(x.potential) #see the layout of x.potential

#create y.potential
system.time(y.potential <- contact::potentialDurations(fullBody_distances.random, blocking = TRUE, blockLength = 1, blockUnit = "hours", distFunction = "dist2All_df"))

#summarize empirical contacts 
system.time(x.summary <- contact::summarizeContacts(x = calf_fullBody_contacts.hr, importBlocks = TRUE, parallel = FALSE)) #note that 
head(x.summary) #see the layout of x.summary

#summarize randomized contacts 
system.time(y.summary <- contact::summarizeContacts(x = calf_fullBody_contacts.hr.random[[2]], importBlocks = TRUE, avg = TRUE, parallel = FALSE))

A few things to note in the summary processing immediately above: 1.) When creating the x.summary object, we could have set importBlocks = FALSE to return edge weights summed over all time blocks. However, in this example we’re interested in comparing total hourly contact rates to the NULL model, so here, in both the x.potential and x.sumary objects we ensure that block information is present. 2.) See that we averaged the edge weights of NULL-model list objects together by setting avg = TRUE in the summarizeContacts function.

Now we can proceed with the comparison.

Note that because of the warning settings in the chisq.test function, we cannot effectively silence the error “Chi-squared approximation may be incorrect.” As such, you may get many warnings to that end.

These warnings are normal and occur when expected pairwise counts are are very small, leading to potentially innacurate chi-squared values. Warnings for specific pairwise tests can be seen in the “warnings” column of the data frames within function output. If you receive these warnings, consider using contactCompare_binom instead contactCompare_chisq.


system.time(fullBody_NULLTest1<-contact::contactCompare_chisq(x.summary, y.summary, x.potential, y.potential, importBlocks = TRUE, shuffle.type = 1, popLevelOutput = TRUE, parallel = FALSE)) #Note that we MUST indicate the shuffle.type used to randomize point-locations.

See that when the “popLevelOutput” argument is TRUE the function returns a list containing two data frames. The first data frame contains Chi-squared test results for individual-level observed node degree, total contact durations, and observed contacts between specific inidividuals. The second data frame contains population-level variations of these metrics. In the case of our example, what we’re after is precisely the popLevelOutput frame. See below that we now have a comparison to the NULL model at each time block.

Note that most of the “metrics” represented here are numbers. These are actually unique calf IDs, and each row indicates contact durations with that specific individual.


head(fullBody_NULLTest1[[2]])

E.) Test empirical networks against each other.

Note that the contactCompare_chisq function SHOULD NOT be used to compare two empirical networks, as the function assumes x.summary and y.summary represent observed and expected values, respectively. If users want to compare two empirical networks, we can compare network similarity using Mantel’s permutation test (Mantel 1967) through the contactCompare_mantel function.

Below we test the similarity of the fullbody and head X head contact networks. See that like contactCompare_chisq, this function requires x.summary and y.summary inputs (though the y.summary DOES NOT have to represent a NULL model). Recall that we have already made the x.summary object, which relates to the fullbody contact network. Before we can run the Mantel test, we must summarize the head X head contact network.


system.time(y.summary2 <- contact::summarizeContacts(calf_fullBody_contacts.hr, importBlocks = FALSE)) #create the new y.summary

system.time(mantelOutput<-contact::contactCompare_mantel(x.summary, y.summary = y.summary2, numPermutations = 1000, alternative.hyp = "two.sided", importBlocks = FALSE))

mantelOutput #based on the reported p-value, given an alpha-level of 0.05, we can reject the NULL hypothesis that these two networks are unrelated. (This is as we would expect, because the head X head-contact network is nested within the fullBody-contact network. If they were not similar, we would have a problem.)

References

Farine, D.R., 2017. A guide to null models for animal social network analysis. Methods in Ecology and Evolution 8:1309-1320. https://doi.org/10.1111/2041-210X.12772.

Farthing, T.S., Dawson, D.E., Sanderson, M.W., and Lanzas, C. 2020. Accounting for space and uncertainty in real-time location system-derived contact networks. Ecology and Evolution 00:1-14. https://doi.org/10.1002/ece3.6225.

Mantel, N. 1967. The detection of disease clustering and a generalized regression approach. Cancer Research 27:209–220.

Spiegel, O., Leu, S.T., Sih, A., and C.M. Bull. 2016. Socially interacting or indifferent neighbors? Randomization of movement paths to tease apart social preference and spatial constraints. Methods in Ecology and Evolution 7:971-979. https://doi.org/10.1111/2041-210X.12553.