1 - Landscape and genetic data processing with graph4lg

UMR ThéMA - UBFC-CNRS, UMR Biogéosciences - UBFC-CNRS, ARP-Astrance

Paul SAVARY

Introduction

The rationale of graph4lg package in R is to make easier the construction and analysis of genetic and landscape graphs in landscape genetic studies (hence the name graph4lg, meaning Graphs for Landscape Genetics). This package provides users with tools for:

Each one of the included tutorials focuses on one of these points. This first tutorial will focus on landscape and genetic data processing. It will describe the package functions allowing users to:

Data used in this tutorial

The package already includes genetic and spatial simulated data sets allowing users to discover its different functionalities. The first data set (data_simul) was simulated with CDPOP (Landguth and Cushman 2010) on a simulated landscape. It consists of 1500 individuals from 50 populations genotyped at 20 microsatellite loci. Individuals dispersed less when the cost-distance between populations was large. A landscape graph was created with Graphab (Foltête, Clauzel, and Vuidel 2012) whose nodes were the 50 simulated populations and the links were weighted by cost-distance values between populations. The project created with Graphab was included into the package such that the landscape graphs and the cost-distance matrix can be easily imported into the R environment.

The second data set (data_ex) was simulated as the first one but included only 10 populations. It is used to generate quick examples.

Genetic data loading and format conversions

The first type of functions from this package allows users to process genetic data. These functions load genetic data in R environment and convert them when necessary.

In order to make the package user-friendly and compatible with genetic data commonly used in landscape genetics, the functions genepop_to_genind, structure_to_genind, gstud_to_genind and loci_to_genind allow users to convert genetic data from formats used respectively in the gstudio (Dyer 2009) and pegas (Paradis 2010) packages in R and in STRUCTURE (Pritchard, Stephens, and Donnelly 2000) and GENEPOP (Raymond 1995) software into R objects with the class attribute genind from ADEGENET package (Jombart 2008). The format genind makes possible the use of time-efficient functions from adegenet package (coded in C). This package was developed and is regularly maintained by Thibaut Jombart (his website). The function genind_to_genepop enables to convert genind object into text files in format genepop in order to perform analyses with this commonly used R package and executable software. All these functions work with microsatellite data (with 2 or 3-digits allele coding).

The package is compatible with SNPs data as soon as they have been loaded into a genind object (see package vcfR for example).

GENEPOP to genind

The GENEPOP software (Raymond 1995) developed by M. Raymond and F. Rousset (Montpellier, France) can be used as an executable file, with or without graphical user interface, or as an R package. It is frequently used to compute FST values and to test for Hardy-Weinberg equilibrium, linkage disequilibrium or genetic differentiation. Besides, when performing simulations with CDPOP (Landguth and Cushman 2010), individual genotypes can be saved as GENEPOP files at the end of the simulation.

The function genepop_to_genind loads a GENEPOP file (.txt extension) and converts it into a genind object. To use it, the path to the file, the total number and names of the loci and the population names must be indicated.

data_genind <- genepop_to_genind(path = paste0(system.file('extdata', 
                                                           package = 'graph4lg'), 
                                               "/gpop_simul_10_g100_04_20.txt"),
                                 n.loci = 20, pop_names = as.character(1:10))
#> Registered S3 method overwritten by 'ade4':
#>   method      from 
#>   print.amova pegas
data_genind
#> /// GENIND OBJECT /////////
#> 
#>  // 200 individuals; 20 loci; 124 alleles; size: 139.8 Kb
#> 
#>  // Basic content
#>    @tab:  200 x 124 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 3-9)
#>    @loc.fac: locus factor for the 124 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: adegenet::df2genind(X = as.matrix(x[, attr(x, "locicol"), drop = FALSE]), 
#>     sep = "/", pop = pop, ploidy = ploidy)
#> 
#>  // Optional content
#>    @pop: population of each individual (group size range: 20-20)

We get a genind object. It contains the genotypes of the 1500 individuals from the 50 populations created during the simulation (similar to the data set data_ex_genind).

genind to GENEPOP

The genind_to_genepop function performs the reverse conversion, i.e. converts a genind object into a GENEPOP file. This file is created and saved in the working directory defined earlier.

genind_to_genepop(x = data_genind, 
                  output = "data_gpop_test.txt")

This function can for example create a GENEPOP file to test for between population genetic differentiation or to compute fixation indices with GENEPOP software.

STRUCTURE to genind

STRUCTURE software (Pritchard, Stephens, and Donnelly 2000) is frequently used in population genetics and landscape genetics. It enables to create population clusters through a Bayesian approach aiming at minimising the deviation from Hardy-Weinberg equilibrium when gathering populations with one another. The input files have a particular structure. The function structure_to_genind converts this type of file into a genind object.

To use the function, we need to indicate the path to the file, the names of the loci, the individual ID and the population names in the same order as in the original file.

loci_names <- paste0("LOCI-", as.character(1:20))

ind_names <- as.character(1:200)
pop_names <- as.character(1:10)
data_paru <- structure_to_genind(path = paste0(system.file('extdata', 
                                                           package = 'graph4lg'), 
                                               "/data_ex_str.txt"),
                                 loci_names = loci_names,
                                 pop_names = pop_names,
                                 ind_names = ind_names)
data_paru
#> /// GENIND OBJECT /////////
#> 
#>  // 200 individuals; 20 loci; 124 alleles; size: 139.5 Kb
#> 
#>  // Basic content
#>    @tab:  200 x 124 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 3-9)
#>    @loc.fac: locus factor for the 124 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: adegenet::df2genind(X = as.matrix(x[, attr(x, "locicol"), drop = FALSE]), 
#>     sep = "/", pop = pop, ploidy = ploidy)
#> 
#>  // Optional content
#>    @pop: population of each individual (group size range: 20-20)

gstud to genind

Packages gstudio and popgraph developed by R. Dyer (Dyer 2009) use as input data R data.frames with columns of class locus. These data.frame objects constitute gstud objects. Given these packages are often used to create genetic graphs, we created a function to convert them into the genind format.

A gstud object generally has the following structure (simulations with 10 populations as an example):

head(data_ex_gstud)

To convert it with the function gstud_to_genind, we indicate the name of the data.frame and of the columns containing population names and individual names:

gstud_to_genind(x = data_ex_gstud, pop_col = "POP",
                ind_col = "ID")
#> Individuals in 'x' were not ordered, they have
#>             been ordered by populations and populations ordered in alphabetic
#>             order for the conversion.
#> /// GENIND OBJECT /////////
#> 
#>  // 200 individuals; 20 loci; 124 alleles; size: 139.8 Kb
#> 
#>  // Basic content
#>    @tab:  200 x 124 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 3-9)
#>    @loc.fac: locus factor for the 124 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: adegenet::df2genind(X = as.matrix(x[, attr(x, "locicol"), drop = FALSE]), 
#>     sep = "/", pop = pop, ploidy = ploidy)
#> 
#>  // Optional content
#>    @pop: population of each individual (group size range: 20-20)

Genetic diversity assessment

Once a genetic dataset has been loaded in the R environment, the function pop_gen_index computes some genetic diversity indices at the population level.

gen_div <- pop_gen_index(data_ex_genind)
head(gen_div)

By default, it computes all the available indices, i.e. the number of individuals (“Nb_ind”), the total allelic richness (“A”), the expected heterozygosity (“He”) and the observed heterozygosity (“Ho”).

Distance calculations

In landscape genetics, apart from local population or landscape properties at the sampling site level, particular attention is given to between population/habitat patch distances. The package includes three functions to compute both genetic and landscape distances.

Genetic distances

From a genind object, the function mat_gen_dist calculates several types of between population genetic distances:

To do these calculations with the function mat_gen_dist, we just have to indicate the name of the genind object which includes the genetic data of the individuals as well as the populations to which each of them belongs. The other argument of the function is the type of genetic distance to compute. Here are two examples:

mat_dps <- mat_gen_dist(x = data_genind, dist = "DPS")
mat_dps[1:5, 1:5]
#>            1        10        11        12        13
#> 1  0.0000000 0.7883333 0.5900000 0.6816667 0.8733333
#> 10 0.7883333 0.0000000 0.5741667 0.5450000 0.8816667
#> 11 0.5900000 0.5741667 0.0000000 0.3750000 0.8658333
#> 12 0.6816667 0.5450000 0.3750000 0.0000000 0.8633333
#> 13 0.8733333 0.8816667 0.8658333 0.8633333 0.0000000
mat_pg <- mat_gen_dist(x = data_genind, dist = "PG")
mat_pg[1:5, 1:5]
#>           1       10       11       12       13
#> 1   0.00000 29.93508 21.67789 25.40865 29.17504
#> 10 29.93508  0.00000 19.15163 20.32432 29.99936
#> 11 21.67789 19.15163  0.00000 13.50181 24.69579
#> 12 25.40865 20.32432 13.50181  0.00000 25.18634
#> 13 29.17504 29.99936 24.69579 25.18634  0.00000

Geographical distances

The package also calculates Euclidean geographical distances between populations with the function mat_geo_dist. It takes as arguments:

When geographical coordinates are given in a projected coordinate reference system (metric), Pythagoras’s theorem is used to compute the distances. Conversely, when they are given in a polar CRS (degrees), Great Circle distances are computed. A warning message is displayed in every case.

Example with the 50 point projected coordinates included in the pts_pop_simul data set:

head(pts_pop_simul)
mat_geo <- mat_geo_dist(data = pts_pop_simul, 
                        ID = "ID", x = "x", y = "y",
                        crds_type = "proj")
#> Coordinates were treated as projected coordinates. Check whether
#>               it is the case.
mat_geo[1:5, 1:5]
#>           1         2        3         4         5
#> 1     0.000  3140.064 18555.86  7605.919  4810.405
#> 2  3140.064     0.000 15417.52 10278.619  6248.200
#> 3 18555.862 15417.522     0.00 25128.669 20209.404
#> 4  7605.919 10278.619 25128.67     0.000  5080.354
#> 5  4810.405  6248.200 20209.40  5080.354     0.000

Examples with 4 city polar coordinates:

city_us <- data.frame(name = c("New York City", "Chicago", 
                               "Los Angeles", "Atlanta"),
                   lat  = c(40.75170,  41.87440,
                            34.05420,  33.75280),
                   lon  = c(-73.99420, -87.63940,
                            -118.24100, -84.39360))

mat_geo_us <- mat_geo_dist(data = city_us,
                           ID = "name", x = "lon", y = "lat",
                           crds_type = "polar")
#> Coordinates were treated as polar coordinates. Check whether
#>               it is the case.

head(mat_geo_us)
#>                  Atlanta    Chicago Los Angeles New York City
#> Atlanta              0.0   367050.9    16225865       1168203
#> Chicago         367050.9        0.0    16590655       1523685
#> Los Angeles   16225865.1 16590655.5           0      15072934
#> New York City  1168203.4  1523684.7    15072934             0

Cost-distances

Because straight line Euclidean distances cannot accurately reflect how species move in landscapes, methods have been developed to better represent how they move by incorporating information about the resistance of landscape features to their movements. Among them, the computation of least-cost paths and of their cumulated cost-distances along these paths is a very popular approach.

The function mat_cost_dist makes possible the computation of cost-distances between a set of points on a raster with discrete cell values corresponding to land use types. Each land use type is given a cost reflecting its resistance to movement. A high value means that the cell is difficult to cross for an individual.

The raster input can be either:

The point (pts) input can be either:

In all cases, this object must have three data columns indicating the point ID, x (longitude) and y (latitude) in a projected coordinate reference system.

To do the computation, we need to create a data.frame with two columns (code and cost) specifying the cost associated with every raster cell value.

By default, the computation is performed with costDistance function from gdistance package (method = "gdistance"). For example :

x <- raster::raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
raster::values(x) <- sample(c(1,2,3,4), size = 100, replace = TRUE)
pts <- data.frame(ID = 1:4,
                  x = c(10, 90, 10, 90),
                  y = c(90, 10, 90, 10))
cost <- data.frame(code = 1:4,
                   cost = c(1, 10, 100, 1000))
mat_cd <- mat_cost_dist(raster = x,
              pts = pts, cost = cost,
              method = "gdistance")
head(mat_cd)
#>          1        2        3        4
#> 1  0.00000 45.66295  0.00000 45.66295
#> 2 45.66295  0.00000 45.66295  0.00000
#> 3  0.00000 45.66295  0.00000 45.66295
#> 4 45.66295  0.00000 45.66295  0.00000

gdistance cannot perform the computation anymore when the raster size and the number of points increase. Using another coding language than R is a way to overcome this limitation. That is why the function mat_cost_dist includes another method for computing cost distances. It relies upon the .jar file costdist-0.3.jar (Java language), developed by Gilles Vuidel (UMR TheMA, Besancon, France). It is automatically downloaded by the user at the first use of the function mat_cost_dist with the argument method = "java", provided Java is installed in the user’s machine. This program makes possible the parallelisation of the computation. The argument parallel.java takes values between 1 and the number of cores in the user’s machine - 1, thereby setting the number of cores used for the computation.

For example, when using method="java", the command should be:

mat_cost_dist(raster = x,
              pts = pts, cost = cost,
              method = "java",
              parallel.java = 2)

When method="java", least cost paths progress in 8 directions from one cell, whereas it can be controlled when method = "gdistance" with the argument direction=4 (or 8 or 16).

When choosing which method? Using method="java" becomes more and more interesting as the raster size and the number of points increase. In the previous example (raster: 10 \(\times\) 10 cells, 4 points), using method="gdistance" is the fastest option (0.67 s. vs 2.56 s.). With a raster of 100 \(\times\) 100 cells and 4 points, both methods are equivalent (2 s.). With a raster of 1000 \(\times\) 1000 cells and 4 points, method="java" is by far the fastest (120 s. vs 8 s.).

Besides, depending on the input data types, computation times can vary:

Processing tools

The functions presented in the last sections create objects in R, such as data.frame and matrix. The package includes functions to process these objects, ordering them, comparing them and converting them.

Reorder a matrix

In landscape genetics, when comparing two matrices of distances between the same sets of elements (e.g. doing a Mantel test to assess isolation by distance patterns), the two matrices must be ordered the same way. The function reorder_mat reorders a symmetric matrix according to a specified order of its row/column names.

For example, when computing a geographical distance from a data.frame in which the ID column contains integer values, the resulting distance matrix is ordered in the increasing order of these ID (1, 2, 3, …., 10). If the same IDs are character strings, the resulting distance matrix will be ordered in alphabetical order (“1”, “10”, “2”, “3”, …, “9”). In the following example, we illustrate how to reorder a matrix in such a case.

mat_g1 <- mat_geo_dist(pts_pop_ex, 
                       ID = "ID", x = "x", y = "y", 
                       crds_type = "proj")
#> Coordinates were treated as projected coordinates. Check whether
#>               it is the case.
head(mat_g1)
#>          1         2         3        4         5        6        7         8
#> 1     0.00 30050.125 21967.248 41694.12 26916.352 19313.21 37283.91 29098.797
#> 2 30050.12     0.000  8360.024 11800.42  6529.931 23825.41 19804.04 25766.063
#> 3 21967.25  8360.024     0.000 19746.39  5728.001 16843.99 20392.40 21332.135
#> 4 41694.12 11800.424 19746.392     0.00 15582.362 32572.99 20145.72 31270.753
#> 5 26916.35  6529.931  5728.001 15582.36     0.000 17629.80 15280.05 19293.781
#> 6 19313.21 23825.407 16843.990 32572.99 17629.804     0.00 20333.47  9790.812
#>          9       10
#> 1 33737.96 29230.81
#> 2 24485.91 30425.15
#> 3 22000.23 25242.03
#> 4 27789.39 36391.76
#> 5 18483.51 23900.84
#> 6 14827.34 10647.07
row.names(mat_g1)
#>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

# Reorder mat_g1
mat_g2 <- reorder_mat(mat_g1, 
                      order = as.character(1:10)[order(as.character(1:10))])
head(mat_g2)
#>           1       10         2         3        4         5        6        7
#> 1      0.00 29230.81 30050.125 21967.248 41694.12 26916.352 19313.21 37283.91
#> 10 29230.81     0.00 30425.154 25242.028 36391.76 23900.837 10647.07 18775.78
#> 2  30050.12 30425.15     0.000  8360.024 11800.42  6529.931 23825.41 19804.04
#> 3  21967.25 25242.03  8360.024     0.000 19746.39  5728.001 16843.99 20392.40
#> 4  41694.12 36391.76 11800.424 19746.392     0.00 15582.362 32572.99 20145.72
#> 5  26916.35 23900.84  6529.931  5728.001 15582.36     0.000 17629.80 15280.05
#>            8        9
#> 1  29098.797 33737.96
#> 10  5147.815 10348.43
#> 2  25766.063 24485.91
#> 3  21332.135 22000.23
#> 4  31270.753 27789.39
#> 5  19293.781 18483.51
row.names(mat_g2)
#>  [1] "1"  "10" "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"

Convert Euclidean distances into cost distances

Cost-distances are expressed in cost units arbitrarily defined based on the cost values assigned to every land cover type when creating resistance surfaces. However, species dispersal distances are usually known as distances expressed in metric units. When we know that the probability that a species covers 10 km is 5 %, we can ask what is the equivalent of this distance in cost distance units according to the assumed cost value scenario. It can be useful in order to prune a landscape graph with a given distance threshold or simply to get an idea of the order of magnitude of cost-distance values.

To that purpose, a regression of the between population cost-distance values against the corresponding geographical distances can be performed. It estimates the relationship between both types of distance. Then, the resulting parameter estimates enable to convert a geographical distance into its cost-distance equivalent according to the cost scenario.

The function convert_cd performs the linear regression or log-log linear regression between the geographical distance matrix and the cost-distance matrix, in the same way as Tournant et al. (2013) and as performed by Graphab software.

Below, we estimate the relationship between geographical distance and cost-distance between the populations used to perform the gene-flow simulation. We convert 10 km into cost-distance units. The function also plots the relationship between these distances.

mat_ld <- reorder_mat(mat_ld, order = row.names(mat_geo))

convert_res <- convert_cd(mat_euc = mat_geo, mat_ld = mat_ld, 
                          to_convert = 10000, fig = TRUE, 
                          method = "log-log", pts_col = "grey")
convert_res
#> $`Converted values`
#>    10000 
#> 1605.543 
#> 
#> $`Model parameters`
#> Intercept     Slope 
#> -2.251216  1.045828 
#> 
#> $`Model multiple R-squared`
#> Multiple R-squared 
#>          0.6870757 
#> 
#> $Plot

In this case, we can see that 10 km are equivalent to 1.606 cost-distance units. The log-log linear model estimates the relationship between geographical distance (GD) and cost-distance (CD) such that: \(log(CD)=-2.2512+1.0458 \times log(GD)\). The determination coefficient \(R^2\) associated to this linear model is 0.69.

A figure is returned as the fourth element of convert_res. The black dot represented on this figure refers to the 10 km value on the regression line characterising the relationship between cost-distance and geographical distance.

From pairwise matrix to data frame

Distance values can be stored in two main ways. First, they can be stored in pairwise distance matrices. The distance between point \(i\) and point \(j\) is the value at the \(i_{th}\) row and the \(j_{th}\) column, and also at the \(j_{th}\) row and the \(i_{th}\) column. Such a matrix is the required input of analyses such as Mantel tests and can be used to create graphs when considered as an adjacency matrix.

However, it is quite a heavy object and these values can be stored in smaller objects called edge lists, which are basically tables with three columns: from, to, distance.

The function pw_mat_to_df converts a pairwise matrix into an edge list stored in a data.frame object.

df_dist <- pw_mat_to_df(pw_mat = mat_geo)
head(df_dist)

The resulting object df_dist is a data.frame with 4 columns: id_1, id_2, id_link and value.

From data frame to pairwise matrix

Conversely, the function df_to_pw_mat converts an edge list stored in a data.frame object into a pairwise matrix.

mat_dist <- df_to_pw_mat(data = df_dist, 
                         from = "id_1", to = "id_2", value = "value")
mat_dist[1:5, 1:5]
#>           1         2        3         4         5
#> 1     0.000  3140.064 18555.86  7605.919  4810.405
#> 2  3140.064     0.000 15417.52 10278.619  6248.200
#> 3 18555.862 15417.522     0.00 25128.669 20209.404
#> 4  7605.919 10278.619 25128.67     0.000  5080.354
#> 5  4810.405  6248.200 20209.40  5080.354     0.000

Conclusion

In the next tutorial, we present how to construct and analyse genetic graphs with graph4lg.

References

Bowcock, Anne M, A Ruiz-Linares, James Tomfohrde, Eric Minch, Judith R Kidd, and L Luca Cavalli-Sforza. 1994. “High Resolution of Human Evolutionary Trees with Polymorphic Microsatellites.” Nature 368 (6470): 455–57.
Dyer, Rodney J. 2009. “GeneticStudio: A Suite of Programs for Spatial Analysis of Genetic-Marker Data.” Molecular Ecology Resources 9 (1): 110–13.
Excoffier, Laurent, Peter E Smouse, and Joseph M Quattro. 1992. “Analysis of Molecular Variance Inferred from Metric Distances Among DNA Haplotypes: Application to Human Mitochondrial DNA Restriction Data.” Genetics 131 (2): 479–91.
Foltête, Jean-Christophe, Céline Clauzel, and Gilles Vuidel. 2012. “A Software Tool Dedicated to the Modelling of Landscape Networks.” Environmental Modelling & Software 38: 316–27.
Fortuna, Miguel A, Rafael G Albaladejo, Laura Fernández, Abelardo Aparicio, and Jordi Bascompte. 2009. “Networks of Spatial Genetic Variation Across Species.” Proceedings of the National Academy of Sciences 106 (45): 19044–49.
Hedrick, Philip W. 2005. “A Standardized Genetic Differentiation Measure.” Evolution 59 (8): 1633–38.
Jombart, Thibaut. 2008. “Adegenet: A r Package for the Multivariate Analysis of Genetic Markers.” Bioinformatics 24 (11): 1403–5.
Jost, Lou. 2008. “GST and Its Relatives Do Not Measure Differentiation.” Molecular Ecology 17 (18): 4015–26.
Landguth, Erin L, and SA Cushman. 2010. “CDPOP: A Spatially Explicit Cost Distance Population Genetics Program.” Molecular Ecology Resources 10 (1): 156–61.
Murphy, Melanie, Rodney Dyer, and Samuel A. Cushman. 2015. “Graph Theory and Network Models in Landscape Genetics.” In Landscape Genetics: Concepts, Methods, Applications, edited by Niko Balkenhol, Samuel Cushman, Andrew Storfer, and Lisette Waits, 1st ed., 165–80. John Wiley & Sons.
Paradis, Emmanuel. 2010. “Pegas: An r Package for Population Genetics with an Integrated–Modular Approach.” Bioinformatics 26 (3): 419–20.
Paschou, Peristera, Petros Drineas, Evangelia Yannaki, Anna Razou, Katerina Kanaki, Fotis Tsetsos, Shanmukha Sampath Padmanabhuni, et al. 2014. “Maritime Route of Colonization of Europe.” Proceedings of the National Academy of Sciences.
Pritchard, Jonathan K, Matthew Stephens, and Peter Donnelly. 2000. “Inference of Population Structure Using Multilocus Genotype Data.” Genetics 155 (2): 945–59.
Raymond, Michel. 1995. “GENEPOP: Population Genetics Software for Exact Tests and Ecumenism. Vers. 1.2.” Journal of Heredity 86: 248–49.
Rousset, François. 1997. “Genetic Differentiation and Estimation of Gene Flow from f-Statistics Under Isolation by Distance.” Genetics 145 (4): 1219–28.
Shirk, AJ, EL Landguth, and SA Cushman. 2017. “A Comparison of Indiviudal-Based Genetic Distance Metrics for Landscape Genetics.” Molecular Ecology 17 (6).
Tournant, Pierline, Eve Afonso, Sébastien Roué, Patrick Giraudoux, and Jean-Christophe Foltête. 2013. “Evaluating the Effect of Habitat Connectivity on the Distribution of Lesser Horseshoe Bat Maternity Roosts Using Landscape Graphs.” Biological Conservation 164: 39–49.
Weir, Bruce S, and C Clark Cockerham. 1984. “Estimating f-Statistics for the Analysis of Population Structure.” Evolution 38 (6): 1358–70.