mine_gmatrix
functionrun_farm_sim
function
The resevol R package is used to simulate individual-based models of agricultural production, pesticide application, and the evolution of pesticide resistance on a spatially explicit landscape. Individual pests are modelled with any number of quantitative traits that have a customisable covariance structure. Covarying trait values are underlied by a pre-specified number of loci and a complex network mapping loci to traits, all of which are specified at the individual level in explicit pest genomes. Simulation requires two steps, each carried out by a separate R function. Step one is run with the mine_gmatrix
function; it finds individual genomes that produce appropriately covarying traits given allele values that are drawn from a standard normal distribution. Step two is run with the run_farm_sim
function; it initialises individuals with the mine_gmatrix
output, then simulates agricultural management in the presence of evolving pests. Here we demonstrate the use of these two functions and offer tips for doing so most effectively.
mine_gmatrix
functionThe goal of the mine_gmatrix
function is to map a set of random standard normal (mean = 0, stdev = 1) values to a set of covarying values using a network of hidden layers. The set of random standard normal values in this case are allele values for a positive integer value of loci. The set of covarying values are the traits that take a pre-specified covariance structure, which is represented by a square variance covariance matrix. An example below includes 12 loci (green circles) and four traits (red diamonds), which are linked through three hidden layers of nodes (blue squares).
Each arrow in the figure above is initialised as a number in the mine_gmatrix
function. An evolutionary algorithm is then used to find a set of arrow values that produces traits with the pre-specified covariance structure given a set of randomly drawn standard normal allele values at loci. The details of this evolutionary algorithm are explained in a separate vignette, but the general idea is that a large set of random networks (i.e., arrow values in the diagram above) are initialised, and the networks that produce traits with a covariance structure most similar to the pre-specified one produce a new generation of networks with some degree of mutation and recombination. Generations continue to evolve until either the maximum number of allowed generations is reached or the network produces traits with a covariance structure that is sufficiently close to the pre-specified one. As an example, we can consider the covariance matrix below.
## [,1] [,2] [,3] [,4]
## [1,] 1.0 -0.5 0.2 0.2
## [2,] -0.5 1.0 0.2 0.2
## [3,] 0.2 0.2 1.0 -0.5
## [4,] 0.2 0.2 -0.5 1.0
This matrix models the pre-specified covariance among four traits. We can assign it as gmt
and use it in the mine_gmatrix
function to find a network that will produce traits with this covariance structure given random standard normal allele values at each loci.
mg <- mine_gmatrix(gmatrix = gmt, loci = 12, indivs = 2000, npsize = 12000,
max_gen = 1200, sampleK = 1200, chooseK = 6, layers = 6,
mu_pr = 0.05, mu_sd = 0.01, pr_cross = 0.05,
term_cri = -5.3, sd_ini = 0.1, use_cor = FALSE);
The above use of mine_gmatrix
includes arguments that specify the details of the evolutionary algorithm.
Argument | Description |
---|---|
gmatrix |
The pre-specified trait covariance matrix. This will define what the covariance will be between each trait when allele values are drawn from a standard normal distribution. |
loci |
The number of loci for an individual (green circles in Figure 1). Simulations can allow for both haploid and diploid individuals. Allele values at each loci affect trait values (red squares in Figure 1) through a network of intermediary nodes (blue squares in Figure 1). |
indivs |
The number of individuals initialised in each generation of the evolutionary algorithm to test among-individual trait correlations. Individuals are initialised with allele values (green circles in Figure 1) drawn from a standard normal distribution. |
npsize |
The size of the population of networks in each generation of the evolutionary algorithm. Each network (Figure 1) is a discrete individual in the population. |
max_gen |
The maximum number of generations that the evolutionary algorithm is allowed to run before terminating (regardless of how well the evolved covariance structure matches the pre-specified gmatrix ). |
sampleK |
During a round of selection, the number of random networks chosen to compete in a tournament. A single generation will include as many tournaments as necessary to create a new network population of size npsize . |
chooseK |
During a round of selection tournament, the number of networks within the sampleK random subset of the tournament that have the highest fitness will be selected to populate the next generation of networks |
layers |
The number of hidden layers in the network linking loci to traits (blue squares in Figure 1) |
mu_pr |
The probability that a value in the network (black arrows in Figure 1) will mutate in a generation. Mutation events change the existing value by adding a new value drawn from a normal distribution with a mean of 0 and standard deviation of mu_sd . |
mu_sd |
The standard deviation of the random normal value mean centered at 0 that is added to the existing value of the network (black arrows in Figure 1) when a mutation event occurs. |
pr_cross |
The probability that a focal network in the population will initiate a crossover of a subset of its values with a randomly selected second network (note that any given network might therefore be included in more than one crossover event in a generation). The size of the subset is determined randomly (conceptually equivalent to drawing a rectangle around arrows in Figure 1, with values within the rectangle then swapped between networks). |
term_cri |
The criteria for terminating the evolutionary algorithm. The algorithm will terminate if a network is found in which the mean squared deviation of the covariance matrix elements from gmatrix is less than exp(term_crit) . |
sd_ini |
The standard deviation of initialised network values at the start of the evolutionary algorithm. All network values are initialised by randomly sampling from a normal distribution with a mean of 0 and a standard deviation of sd_ini . |
use_cor |
Should the gmatrix be treated as a correlation matrix rather than a covariance matrix when calculating fitness? |
Table: Arguments taken by the mine_gmatrix
function.
Running the mine_gmatrix
function can take several minutes to several hours (or longer) depending on the parameters used. The time required scales exponentially with the dimensions of gmatrix
(i.e., the number of traits that need to be found). Default argument values are specified in attempt to be efficient across a wide range of conditions, but in practice it is often best to experiment with different values. As with any evolving population, selection is more effective in larger populations (i.e., high npsize
), but large populations are also more time consuming to simulate. The indivs
argument can be increased to give more precision when comparing the fitness of networks. Values of sd_ini
are usually most effective when they are roughly an order of magnitude below that of the largest element value of gmatrix
. Values of sampleK
, chooseK
, mu_pr
, mu_sd
, and pr_cross
sometimes require a bit more experimentation to maximise the efficiency of the function in finding solutions. Note that mu_pr
and pr_cross
are often most effective at values much higher than would be expected in real biological organisms [@Hamblin2013].
When mine_gmatrix
is running, the generation number, mean stress (i.e., log mean squared deviation of the covariance matrix elements from gmatrix
), and minimum stress will be printed in the console (this can be turned off by specifying prnt_out = FALSE
). Below shows how the console output will appear.
===============================================
Initialising gmatrix mining...
===============================================
Gen: 0 Stress: -0.763570 Min: -0.763570
Gen: 1 Stress: -0.763570 Min: -0.763570
Gen: 2 Stress: -0.763570 Min: -0.763570
Gen: 3 Stress: -0.763570 Min: -0.763570
Gen: 4 Stress: -0.763570 Min: -0.763571
Gen: 5 Stress: -0.763570 Min: -0.763571
Gen: 6 Stress: -0.763571 Min: -0.763572
As previously mentioned, finding a network can be time-consuming. The resevol package comes with some example pre-installed.
data("gmatrices");
We can use mg_v1
as an example of output from mine_gmatrix
. In mg_v1
, the pre-specified covariance matrix is the same as the gmt
matrix used above. The output includes eight elements, the first of which includes all of the argument inputs to the mine_gmatrix
function.
## [1] 12.00 6.00 2000.00 12000.00 0.05 0.01 2400.00 0.05
## [9] 1200.00 6.00 -5.30 0.10 0.00 1.00
Argument inputs in the elements above include (1) loci
, (2) layers
, (3) indivs
, (4) npsize
, (5) mu_pr
, (6) mu_sd
, (7) max_gen
, (8) pr_cross
, (9) sampleK
, (10) chooseK
, (11) term_cri
, (12) sd_ini
, (13) use_cor
, and (14) prnt_out
. The second element of mine_gmatrix
output is the pre-specified covariance matrix gmatrix
.
## [,1] [,2] [,3] [,4]
## [1,] 1.0 -0.5 0.2 0.2
## [2,] -0.5 1.0 0.2 0.2
## [3,] 0.2 0.2 1.0 -0.5
## [4,] 0.2 0.2 -0.5 1.0
The third element of mine_gmatrix
output is a matrix with elements that specify the effect of each locus value (rows) on the first hidden layer (columns). Conceptually, these are the values of the Figure 1 arrows mapping the green circles to the first column of blue squares.
## [,1] [,2] [,3] [,4]
## [1,] -0.62181158 -0.23246088 1.027398e-01 0.36089789
## [2,] 0.11516718 0.35628744 1.932607e-06 0.38293051
## [3,] -0.33493318 0.32627944 6.940649e-02 -0.30558144
## [4,] -0.07520604 -0.32440111 3.741366e-01 0.24383415
## [5,] -0.51259791 -0.06048184 -8.689474e-01 0.60746638
## [6,] 0.21803456 -0.38792630 2.014267e-01 -0.42601954
## [7,] -0.22419755 -0.84531007 5.902170e-01 -0.10942012
## [8,] 0.73822815 -0.30174497 2.573475e-01 0.23633850
## [9,] -0.45722066 0.09532205 5.540713e-01 -0.06608855
## [10,] 0.28739580 -0.12243227 -1.553612e-01 -0.01861052
## [11,] -0.09843707 -0.18971567 6.172619e-01 -0.17746038
## [12,] 0.64383533 -0.19503881 -1.054181e-01 -0.37567049
The fourth element of mine_gmatrix
output is a three dimensional array that specifies the effects between all hidden layers, and between the last hidden layer and the trait values (i.e., in Figure 1, all of the arrows to the right of the first column of blue squares). The third dimension of the array below separates the sets arrows between hidden layers (i.e., each matrix below corresponds to a map between hidden layers, or between the last hidden layer and the traits). Note that while Figure 1 includes only three hidden layers, there are six below.
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 0.1876383 -0.7766931 0.2561096 -0.1594451
## [2,] -0.7218431 -0.1720015 0.1051845 -0.1391646
## [3,] -0.5724185 -0.6303034 -0.1351206 -0.1521792
## [4,] -0.1518009 -0.4726420 0.6897870 -0.2687902
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] -0.89598885 -0.2361661 0.03713508 -0.4339221
## [2,] -0.03818028 0.1322045 0.52982046 -0.2206081
## [3,] -0.16489913 0.2119842 0.44171859 0.7881612
## [4,] 0.35726940 -0.5271966 0.13122182 -0.2990653
##
## , , 3
##
## [,1] [,2] [,3] [,4]
## [1,] 0.41092320 0.28480202 -0.1120791 -0.4511020
## [2,] 0.19794635 0.70486627 -0.2855835 -0.3474222
## [3,] -0.84575895 -0.06894474 0.2846113 -0.5004831
## [4,] 0.08297854 0.31942452 0.8976090 0.4573121
##
## , , 4
##
## [,1] [,2] [,3] [,4]
## [1,] 0.2253143 0.2004491 0.74271147 1.0243443
## [2,] -0.7223198 0.5598523 0.41619436 -0.1152639
## [3,] -0.2417876 0.5954666 -0.03304092 -0.4663887
## [4,] 0.4589945 0.5626436 -0.42824309 -0.3633294
##
## , , 5
##
## [,1] [,2] [,3] [,4]
## [1,] 0.05722936 -0.7678113 0.7009198 0.15491320
## [2,] 0.08709396 -0.6969477 0.4913277 -0.73493561
## [3,] -0.70780111 0.7518397 0.4779771 0.41039047
## [4,] -0.18146366 -0.6877152 0.4546469 -0.01592507
##
## , , 6
##
## [,1] [,2] [,3] [,4]
## [1,] -0.484868322 0.2352634 -0.5067399 0.51615705
## [2,] 1.272293446 -0.7279591 -0.4885374 1.19602908
## [3,] 0.653742180 -0.6679193 0.2497727 -0.05637619
## [4,] -0.008199929 1.2147659 0.9983367 0.26934001
The fifth element of mine_gmatrix
is a matrix that shows the marginal effect of each locus (rows) on each trait (columns) value. Note that this is this is the product of mg_v1[[3]]
times each layer of mg_v1[[4]]
in sequence (i.e., mg_v1[[4]][,,1]
times mg_v1[[4]][,,2]
, and so forth). Conceptually, this matrix isolates the effect of each locus on each trait if all other loci are held at a constant value.
## [,1] [,2] [,3] [,4]
## [1,] 0.11355835 0.002908792 -0.122576128 0.24062456
## [2,] 0.19860768 -0.560485161 -0.238001247 -0.13433740
## [3,] 0.35488346 0.057368027 0.251966132 0.15915978
## [4,] -0.01866734 -0.145837880 -0.007194782 -0.16074206
## [5,] -0.23320888 0.208434036 -0.643175344 0.63047911
## [6,] -0.24710427 0.412170404 0.305025482 -0.13385957
## [7,] -0.21817543 0.395054251 0.293741747 -0.11102551
## [8,] -0.33650514 -0.312956818 -0.128296727 -0.52754011
## [9,] 0.46183511 -0.197388878 0.295203638 -0.03835963
## [10,] -0.24765628 0.105117973 -0.078117455 -0.06127897
## [11,] 0.17745685 -0.052061251 0.344817622 -0.22442234
## [12,] -0.40187122 0.296093799 0.120292139 -0.22079164
The sixth element of mine_gmatrix
is a matrix of the observed covariance structure of traits given a set of indivs
individuals with random standard normal values at each locus. This matrix can be compared to the pre-specified gmatrix
. While there will always be small deviations that are minimised by the function, it might often be useful to check the difference between matrices manually.
## [,1] [,2] [,3] [,4]
## [1,] 0.9936103 -0.5106387 0.2324040 0.2367691
## [2,] -0.5106387 1.0193888 0.2573216 0.2723028
## [3,] 0.2324040 0.2573216 0.9334559 -0.4480018
## [4,] 0.2367691 0.2723028 -0.4480018 0.9689539
The seventh element of mine_gmatrix
is a vectorisation of mg_v1[[3]]
and mg_v1[[4]]
. That is, a vector that includes all of the information needed to map a set of allele values to trait values (i.e., values for all arrows shown in Figure 1). This vector of network values is inserted into the genome of each individual during a simulation, so each individual's allele values can then be mapped to their appropriate trait values. Simulations can then be run in which these network values are static, or in which they may evolve at any number of paths from layer to layer (i.e., columns of arrows in Figure 1).
## [1] -6.218116e-01 -2.324609e-01 1.027398e-01 3.608979e-01 1.151672e-01
## [6] 3.562874e-01 1.932607e-06 3.829305e-01 -3.349332e-01 3.262794e-01
## [11] 6.940649e-02 -3.055814e-01 -7.520604e-02 -3.244011e-01 3.741366e-01
## [16] 2.438342e-01 -5.125979e-01 -6.048184e-02 -8.689474e-01 6.074664e-01
## [21] 2.180346e-01 -3.879263e-01 2.014267e-01 -4.260195e-01 -2.241976e-01
## [26] -8.453101e-01 5.902170e-01 -1.094201e-01 7.382282e-01 -3.017450e-01
## [31] 2.573475e-01 2.363385e-01 -4.572207e-01 9.532205e-02 5.540713e-01
## [36] -6.608855e-02 2.873958e-01 -1.224323e-01 -1.553612e-01 -1.861052e-02
## [41] -9.843707e-02 -1.897157e-01 6.172619e-01 -1.774604e-01 6.438353e-01
## [46] -1.950388e-01 -1.054181e-01 -3.756705e-01 1.876383e-01 -7.766931e-01
## [51] 2.561096e-01 -1.594451e-01 -7.218431e-01 -1.720015e-01 1.051845e-01
## [56] -1.391646e-01 -5.724185e-01 -6.303034e-01 -1.351206e-01 -1.521792e-01
## [61] -1.518009e-01 -4.726420e-01 6.897870e-01 -2.687902e-01 -8.959888e-01
## [66] -2.361661e-01 3.713508e-02 -4.339221e-01 -3.818028e-02 1.322045e-01
## [71] 5.298205e-01 -2.206081e-01 -1.648991e-01 2.119842e-01 4.417186e-01
## [76] 7.881612e-01 3.572694e-01 -5.271966e-01 1.312218e-01 -2.990653e-01
## [81] 4.109232e-01 2.848020e-01 -1.120791e-01 -4.511020e-01 1.979463e-01
## [86] 7.048663e-01 -2.855835e-01 -3.474222e-01 -8.457590e-01 -6.894474e-02
## [91] 2.846113e-01 -5.004831e-01 8.297854e-02 3.194245e-01 8.976090e-01
## [96] 4.573121e-01 2.253143e-01 2.004491e-01 7.427115e-01 1.024344e+00
## [101] -7.223198e-01 5.598523e-01 4.161944e-01 -1.152639e-01 -2.417876e-01
## [106] 5.954666e-01 -3.304092e-02 -4.663887e-01 4.589945e-01 5.626436e-01
## [111] -4.282431e-01 -3.633294e-01 5.722936e-02 -7.678113e-01 7.009198e-01
## [116] 1.549132e-01 8.709396e-02 -6.969477e-01 4.913277e-01 -7.349356e-01
## [121] -7.078011e-01 7.518397e-01 4.779771e-01 4.103905e-01 -1.814637e-01
## [126] -6.877152e-01 4.546469e-01 -1.592507e-02 -4.848683e-01 2.352634e-01
## [131] -5.067399e-01 5.161570e-01 1.272293e+00 -7.279591e-01 -4.885374e-01
## [136] 1.196029e+00 6.537422e-01 -6.679193e-01 2.497727e-01 -5.637619e-02
## [141] -8.199929e-03 1.214766e+00 9.983367e-01 2.693400e-01
The eighth and final element of mine_gmatrix
reports the stress of the network produced. That is, the logged expected squared deviation of each element in the observed covariance matrix from the pre-specified covariance matrix (note, small deviations might arise between this value and the value obtained from manually comparing mg_v1[[2]]
and mg_v1[[6]]
due to a different set of indivs
individuals being used to produce mg_v1[[6]]
and calculate mg_v1[[8]]
below).
## [1] -6.237817
The output of mine_gmatrix
is necessary for initialising new individuals in the individual-based model. The individual-based model is run using the run_farm_sim
function, which is explained in the next section.
run_farm_sim
functionSimulations are run using the run_farm_sim
function, which initialises a spatially explicit landscape and individual pests, then simulates the population and evolutionary dynamics of those pests for a specific landscape management regime of crop use and pesticide application. The simulation terminates after a fixed number of time steps have passed, or if the population goes extinct. Because of the way that events occur within the simulation (e.g., demographic processes of individuals, changes in landscape properties), time steps do not have a fixed interpretation (e.g., one growing season or year). Instead, arguments to run_farm_sim
must be carefully considered to effectively model the social-ecological system. Similarly, the order of events in the life-cycle of individuals is not inherently fixed, but is instead defined by the minimum and maximum age at which movement, reproduction, and feeding occur for individuals in the population. How this works is presented in more detail with the explanation of time steps in run_farm_sim
. We first explain the inputs.
run_farm_sim
The run_farm_sim
function takes arguments that specify simulation parameter values and produces simulation output mainly in the form of CSV files printed to the R working directory. Arguments to run_farm_sim
are explained in the table below.
Argument | Description |
---|---|
N |
The number of individuals that are initialised in a simulation. Individuals are initialised in a random location on the landscape, and at least two individuals are needed. |
xdim |
The number of cells in the horizontal dimension of the landscape This value must be an integer greater than two. |
ydim |
The number of cells in the vertical dimension of the landscape This value must be an integer greater than two. |
repro |
The type of reproduction that individuals undergo in the simulation. There are three options: (1) “asexual”, in which individuals reproduce clonally and offspring have haploid genomes and traits identical to their mother with the potential for mutation; (2) “sexual”, in which individuals are monoecious (both female and male) and offspring have diploid genomes with alleles inherited from both parents with mutation and recombination; (3) “biparental”, in which individuals are dioecious (only female or male) and offspring have diploid genomes with alleles inherited from both parents with mutation and recombination. |
neutral_loci |
The number of loci that are completely neutral (i.e., have no effect on fitness). These loci can be used to monitor genetic drift or calculate inbreeding coefficients. |
max_age |
This is the maximum number of time steps that an individual can survive. Individuals that are older than this age in a time step will always die. |
min_age_move |
This is the minimum age at which an individual can move. Individuals below this age will always remain on their current cell. |
min_age_move |
This is the maximum age at which an individual can move. Individuals above this age will always remain on their current cell. |
min_age_reproduce |
This is the minimum age at which an individual can be reproductively active. No individuals below this age will engage in any reproductive activity, nor will they be recognised as potential mates by other individuals. |
max_age_reproduce |
This is the maximum age at which an individual can be reproductively active. No individuals above this age will engage in any reproductive activity, nor will they be recognised as potential mates by other individuals. |
min_age_feed |
This is the minimum age at which an individual can eat. No individuals below this age will be able to consume food on the landscape. |
max_age_feed |
This is the maximum age at which an individual can eat. No individuals above this age will be able to consume food on the landscape. |
food_consume |
This defines how much food an individual will consume from the cell on which it is feeding. Food consumption can take on any positive real value, and an individual will consume up to this amount if possible (if not, they will consume however much food is left within their landscape cell). |
pesticide_consume |
This defines how much pesticide an individual will consume from the cell on which it resides. Pesticide consumption can take on any positivie real value, and an individual will consume up to this amount if possible (if not, they will consume however much pesticide has been placed on the landscape cell). |
rand_age |
This argument determines whether individuals in the simulation will be initialised with a random age selected uniformly from zero to max_age . If FALSE , then all individuals will be initialised at age zero. |
move_distance |
This is the maximum number of cells that an individual can move, in any direction, on the landscape during one bout of movement. |
food_needed_surv |
This is the amount of food that an individual needs to consume to survive. If the individual has not consumed this amount of food before the age of age_food_threshold , then they will die in the time step. |
pesticide_tolerated_surv |
This is the amount of pesticide that an individual can tolerate and still survive. If the individual has consumed more than this amount of pesticide on or after the age of age_pesticide_threshold , then they will die in the time step. |
food_needed_repr |
This is the amount of food that an individual needs to produce one offspring. The total number of offspring that an individual produces in a time step is the floor value of their food consumption divided by this value. |
pesticide_tolerated_repr |
This is the amount of pesticide tolerated below which an individual can reproduce. Note that individuals above the threshold can still mate and sire offspring, |
reproduction_type |
This determines how individuals reproduce; the two options are “lambda” and “food_based”. If “lambda”, then the number of offspring an individual produces is sampled from a Poisson distribution with a fixed rate parameter lambda_value (potentially adjusted by other factors in the simulation). If “food_based”, then the number of offspring produced is based on the amount of food consumed by the individual. |
mating_distance |
This is the distance in cells (any direction) away from a focal individual from which they can successfully find and identify a mate (e.g., if 0, then only individuals on the same cell are potential mates). |
lambda_value |
This is the rate parameter for the Poisson sampling of offspring number; it only applies when reproduction_type is set to “lambda”. |
movement_bouts |
This is the number of times an individual can move in a single time step (i.e., the number of cells that it can potentially visit). Each time an individual visits a new cell, it can potentially feed or consume pesticide. |
selfing |
This determines whether or not self-fertilisation is allowed when repro is set to 'sexual'. |
feed_while_moving |
If TRUE , then individuals will feed in each movement bout when they arrive to a new landscape cell. |
pesticide_while_moving |
If TRUE , then individuals will consume pesticide in each movement bout when they arrive to a new landscape cell. |
mortality_type |
This determines how mortality is enacted in the simulation. Currently there is only one mortality type possible; mortality occurs if individuals exceed their maximum age, do not consume enough food, or consume too much pesticide. |
age_food_threshold |
This is the age at which mortality associated with feeding is enacted, so an individual younger than this age will not die if they have not yet consumed sufficient food to satisfy food_needed_surv . |
age_pesticide_threshold |
This is the age at which mortality associated with pesticide consumption is enacted, so an individual younger than this age will not die even if they have exceeded their pesticide threshold. |
farms |
This is the number of farms to be placed on the landscape. Farms are placed in blocks of roughly equal sizes using a shortest splitline algorithm. Farms operate independently in terms of what crops they grow and pesticides they apply. |
time_steps |
This is the number of time steps that a simulation will run. Simulations will be terminated before this number if extinction occurs. |
mutation_pr |
This is the probability of mutation occurring at any locus of a newly produced offspring. |
crossover_pr |
This is the probability of crossover between two homologous loci. This only applies for diploid genomes. |
mutation_type |
This determines how mutation is modelled. If 0, then a completely new allele value is drawn from a normal distribution with a mean of mutation_direction and a standard deviation of 1 (or 1 / sqrt(2) for diploids, so that the expected standard devation of the sum of both allele values is 1). If 1, then a new value is drawn from a normal distribution with mean mutation_direction and standard deviation of 1, and this new value is then added to the existing allele value. |
net_mu_layers |
This is the proportion of the genome that can evolve. If 0, then only loci values (green circles in Figure 1) can mutate. If 1, then loci and the first column of arrows (green circles to first column of blue squares in Figure 1) can mutate. If 2, then the first two columns of arrows in Figure 1 can mutate, and so forth. Fewer mutation layers will constrain the covariance among traits, while more mutation layers will allow the covariance structure to evolve more readily. |
net_mu_dir |
The direction along the network in which net_mu_layers applies (not loci, green circles in Figure 1, can always mutate). If 1, then net_mu_layers applies in the direction from loci to traits. If 0, then the direction applies from traits to loci (i.e., net_mu_dir = 0 and net_mu_layers = 1 would mean that only the arrow values between the last hidden layer and traits in Figure 1 could mutate). |
mutation_direction |
This allows mutations to be biased in one direction. A default value of 0 makes positive or negative allele values equally likely. |
crop_rotation_type |
This determines how crop types are rotated across the landscape. A value of 1 means that crops will never rotate, while a value of 2 means that a new crop type will be randomly chosen every crop_rotation_time time steps. |
crop_rotation_time |
This determines how many time steps a crop is left before being refreshed and potentially changed. Note that even if the crop type does not change, this value still has the effect of determining how often crops are replenished (if some have been eaten since the last time they were replenished). |
pesticide_rotation_type |
This determines how pesticide types are rotated across the landscape. A value of 1 means that pesticides never rotate, while a value of 2 means that a new pesticide type will randomly chosen every pesticide_rotation_time time steps. |
pesticide_rotation_time |
This determines how many time steps a pesticide is left before being replenished and potentially changed. Note that unlike crops, pesticide levels do not decrease on the landscape over time (e.g., with consumption). |
crop_per_cell |
This determines the expected amount of crop that is placed on a single landscape cell. The more crop on a cell, the more that can be potentially consumed by individuals. |
pesticide_per_cell |
This determines how much pesticide is placed on a single landscape cell. The higher concentration of pesticide per cell, the more that individuals on the cell will imbibe and potentially be affected by. |
crop_sd |
This is the standard deviation of crop number placed on landscape cells. A default value of 0 assumes that all cells have the same amount of crop. |
pesticide_sd |
This is the standard deviation of pesticide applied to each landscape cell. A default value of 0 assumes that each cell has the same concentration of pesticide applied. |
crop_min |
This is the minimum amount of crop that is possible to have on a single cell (i.e., crop values will never be initialised to be lower than this value). |
crop_max |
This is the maximum amount of crop that is possible to have on a single cell (i.e., crop values will never be initialised to be higher than this value). |
pesticide_min |
This is the minimum concentration of pesticide that is possible to have on a single cell (i.e., pesticide values will never be initialised to be lower than this value). |
pesticide_max |
This is the maximum concentration of pesticide that is possible to have on a single cell (i.e., pesticide values will never be initialised to be higher than this value). |
crop_number |
This is the number of unique crops that can exist on the landscape during the course of a simulation. The maximum number of possible crops is 10. |
pesticide_number |
This is the number of unique pesticides that can exist on the landscape during the course of a simulation. The maximum number of possible pesticides is 10. |
print_inds |
If TRUE , a CSV file will print in the working directory with every individual and all of their characteristics (i.e., locations, traits, genomes) in every time step. By default, this is set to FALSE and should only be set to TRUE with extreme caution, as large populations persisting over long periods of time can produce extremely large CSV files. |
print_gens |
If TRUE , the time step and the population size will be printed to the R console as the simulation is running. |
print_last |
If TRUE , a CSV file will print in the working directory with every individual and all of their characteristics (i.e., locations, traits, genomes) in only the last time step. Note that for large populations, the file size generated can be very large (10s to 100s of GBs). |
K_on_birth |
This is a carrying capacity applied to new individuals across the entire landscape. If the total number of offspring in a time step exceeds this value, then offspring are removed at random until the total number of new offspring equals K_on_birth . In practice, this can help speed up simulations by avoiding the unnecessary production of individuals when most will perish. |
pesticide_start |
This is the time step at which pesticide begins to be applied. No pesticide will be applied prior to this start time, so individuals will not experience any effects of pesticide. This can be useful as a tool to burn in the population prior to introducing pesticide. |
immigration_rate |
This is the number of immigrant individuals arriving in the landscape in each time step. Immigrants are initialised in random locations with the same network structure (Figure 1) as individuals initialised at the start of the simulation, and with allele values randomly drawn from a standard normal distribution. |
get_f_coef |
This determines whether or not inbreeding coefficients will be calculated for sexual populations and printed off in CSV files. Because this can add some computation time, it is best to set to FALSE unless it is needed. |
get_stats |
If TRUE , a CSV file will print in the working directory with summary statistics for each time step. This is set to TRUE by default. |
metabolism |
This determines the rate at which food consumed in previous time steps is lost in subsequent time steps, which can be especially relevant if food consumed determines survival or reproductive output. Values of 0 mean that stored gains will always persist throughout an individual's lifetime, while very high values will model the gains of one time step being wiped out in subsequent time steps (if, e.g., the objective is to model individuals needing to consume food successfully in each time step to survive or reproduce, as opposed to having a feeding life history stage followed by a mating and reproduction stage). |
baseline_metabolism |
This fixes a baseline metabolic rate at which food consumed in previous time steps is lost in subsequent steps. This fixed value is always added to metabolism for each individual. By default, this value is 0. |
min_age_metabolism |
This determines the minimum age at which losses of food consumed in previous time steps enacted by metabolism and baseline_metabolsim can occur. |
max_age_metabolism |
This determines the maximum age at which losses of food consumed in previous time steps enacted by metabolism and baseline_metabolsim can occur. |
This large list of potential inputs can make setting up simulations challenging, but the options are intended to give a high degree of flexibility in the type of biological system being modelled.
run_farm_sim
To model a species with non-overlapping generations that moves around the landscape and has a reproductive success that is determined by their food consumption, parameters could be set with a max_age
of one, high movement_bouts
, and no age related threshold of feeding, reproducing, or pesticide consumption. Below shows a hypothetical set of parameter combinations that might fulfil this criteria (note that maximum age thresholds for feeding, reproducing, or pesticide consumption are 9 by default).
sim <- run_farm_sim(mine_output = mg_v1, N = 1000, neutral_loci = 1000,
xdim = 64, ydim = 64, repro = "sexual", max_age = 1,
selfing = TRUE, food_consume = c(2, 2),
pesticide_consume = c(0, 0), food_needed_surv = 1,
food_needed_repr = 2, reproduction_type = "food_based",
pesticide_tolerated_surv = 0, pesticide_rotation_type = 2,
crop_rotation_type = 2, min_age_reproduce = 0,
farms = 24, time_steps = 40, mutation_pr = 0.001,
crossover_pr = 0.1, net_mu_layers = 0,
crop_rotation_time = 1, pesticide_rotation_time = 1,
crop_per_cell = 1, pesticide_per_cell = 0.4,
crop_number = 2, pesticide_number = 2, print_inds = FALSE,
K_on_birth = 1000000, min_age_move = 0,
age_food_threshold = 0, min_age_feed = 0,
pesticide_start = 20, print_last = TRUE,
mating_distance = 2, immigration_rate = 0, metabolism = 0,
baseline_metabolism = 0, movement_bouts = 4,
feed_while_moving = TRUE, pesticide_while_moving = TRUE);
To model a similar system with overlapping generations, parameters could be set with max_age > 1
and a very high baseline_metabolism
(i.e., high enough so that food consumption gains can only apply to the current time step), and otherwise identical parameters. The function above has been modified slightly to fulfil this criteria by increasing max_age
to 4 and setting baseline_metabolism = 10000
, but otherwise keeping the same arguments.
sim <- run_farm_sim(mine_output = mg_v1, N = 1000, neutral_loci = 1000,
xdim = 64, ydim = 64, repro = "sexual", max_age = 4,
selfing = TRUE, food_consume = c(2, 2),
pesticide_consume = c(0, 0), food_needed_surv = 1,
food_needed_repr = 2, reproduction_type = "food_based",
pesticide_tolerated_surv = 0, pesticide_rotation_type = 2,
crop_rotation_type = 2, min_age_reproduce = 0,
max_age_feed = 1, farms = 24, time_steps = 40,
mutation_pr = 0.001, crossover_pr = 0.1, net_mu_layers = 0,
crop_rotation_time = 1, pesticide_rotation_time = 1,
crop_per_cell = 1, pesticide_per_cell = 0.4,
crop_number = 2, pesticide_number = 2, print_inds = FALSE,
K_on_birth = 1000000, min_age_move = 0,
age_food_threshold = 0, min_age_feed = 0,
pesticide_start = 20, print_last = TRUE,
mating_distance = 2, immigration_rate = 0, metabolism = 0,
baseline_metabolism = 10000, movement_bouts = 4,
feed_while_moving = TRUE, pesticide_while_moving = TRUE);
To model a system in which individuals develop in a sedentary life history stage that persists until a final stage of movement and reproduction (e.g., an insect with a larval and pupal stage), parameters could be set with min and max ages of feeding and pesticide consumption from 0 to 3, with min and max ages of movement and reproduction set at 4, max_age = 4
, and baseline_metabolism = 0
. Hence, food or pesticide consumption accrued in early life history stages would affect reproductive output in a later life history stage.
sim <- run_farm_sim(mine_output = mg_v1, N = 1000, neutral_loci = 1000,
xdim = 64, ydim = 64, repro = "asexual", max_age = 4,
selfing = TRUE, food_consume = c(2, 2),
pesticide_consume = c(0, 0), food_needed_surv = 1,
food_needed_repr = 1, reproduction_type = "food_based",
pesticide_tolerated_surv = 0, pesticide_rotation_type = 2,
crop_rotation_type = 2, min_age_reproduce = 4,
max_age_feed = 3, farms = 24, time_steps = 48,
mutation_pr = 0.001, crossover_pr = 0.1, net_mu_layers = 0,
crop_rotation_time = 12, pesticide_rotation_time = 12,
crop_per_cell = 4, pesticide_per_cell = 0.4,
crop_number = 2, pesticide_number = 2, min_age_move = 4,
age_food_threshold = 3, min_age_feed = 1,
pesticide_start = 100, print_last = TRUE,
mating_distance = 2, metabolism = 0);
It might often be useful to think of crop_rotation_time
and pesticide_rotation_time
as defining the length of a season, with minimum and maximum ages for individuals defining the relative length of their life history processes. It is important to emphasise that crops are not replenished in each time step, so if, e.g., half of the food on a cell is eaten in a time step, then the amount of food will not be replenished until the next rotation occurs regardless of whether or not the crop type will change. Note that non-crop and pesticide free land can be created by creating more crops or pesticides than can be consumed (e.g., in the above simulation, by setting crop_number = 3
and pesticide_number = 3
, areas of the landscape would be generated with effectively no crops or pesticides because the default food_consume
and pesticide_consume
for the third crop and pesticide would be zero).
run_farm_sim
The output of run_farm_sim
to the console is a list of two elements. The first element is just a list of parameter values run in the model, and the second element is a three dimensional array of the landscape. The most useful output will be some combination of three CSV files printed off when the simulation is run. The first file is 'population_data.csv', which will provide a large table of population level statistics for each time step in the simulation. This is the smallest and often most useful file. Column names in the file provide information about the data.
The second file is 'last_time_step.csv', which will provide a very large able that includes the characteristics (columns) of all individuals (rows) in the last time step of the simulation. This file can be used to make inferences about individual-level variation in the population (e.g., spatial distribution of individuals, correlations between traits and fitness). The number of columns will vary based on simulation parameters, and column names are not currently printed. Nevertheless, a few relevant column numbers are consistent in the information that they provide about an individual:
The trait value columns continue for as many traits as are specified by the mine_gmatrix
output (mine_output
argument in run_farm_sim
). This 'last_time_step.csv' should be printed out with some care; since the entire genomes of each individual are included, large populations can produce very large file sizes.
The third file is 'individuals.csv', which will provide an extremely large table with every individual in the simulation and all of their characteristics (i.e., the same contents as are in 'last_time_step.csv', but for all time steps). This file should not be made unless a trial simulation has been run with a small number of time steps first to ensure that the CSV file will not become too large to store.
run_farm_sim
In all of the examples used above, values were hard coded into the run_farm_sim
function, so the four traits mined in mg_v1
were not actually used. The resevol package allows evolving traits to be substituted for the following individual characteristics set in the arguments of run_farm_sim
: move_distance
, food_needed_surv
, pesticide_tolerated_surv
, food_needed_repr
, pesticide_tolerated_repr
, mating_distance
, lambda_value
, movement_bouts
, metabolism
, food_consume
(up to 10 elements), and pesticide_consume
(up to 10 elements). When a trait is substituted for the argument, an individual's value for the argument is determined by the trait (e.g., the food needed to survive can be determined by a trait unique to each individual rather than set as a numerical value that applies to all individuals). Setting an argument to the trait is done by placing “T[n]” in quotes where [n] is the number of the trait (corresponding to the n row and column of the pre-specified trait covariance matrix). Below shows the last example used above, but with food_consume = c(2, 2)
and pesticide_consume = c(0, 0)
replaced with evolving traits.
sim <- run_farm_sim(mine_output = mg_v1, N = 1000, neutral_loci = 1000,
xdim = 64, ydim = 64, repro = "asexual", max_age = 4,
selfing = TRUE, food_consume = c("T1", "T2"),
pesticide_consume = c("T3", "T4"),
food_needed_surv = 1, food_needed_repr = 1,
reproduction_type = "food_based",
pesticide_tolerated_surv = 0, pesticide_rotation_type = 2,
crop_rotation_type = 2, min_age_reproduce = 4,
max_age_feed = 3, farms = 24, time_steps = 48,
mutation_pr = 0.001, crossover_pr = 0.1, net_mu_layers = 0,
crop_rotation_time = 12, pesticide_rotation_time = 12,
crop_per_cell = 4, pesticide_per_cell = 0.4,
crop_number = 2, pesticide_number = 2, min_age_move = 4,
age_food_threshold = 3, min_age_feed = 1,
pesticide_start = 100, print_last = TRUE,
mating_distance = 2, metabolism = 0);
The run_farm_sim
will recognise now that traits 1 and 2 will determine the amount of consumed on a landscape cell, while traits 3 and 4 will determine the amount of pesticide consumed.
Additional guidance can be found in the package notes on GitHub and the package documentation. Any queries concerning guidance for setting up and running simulations can be posted on the GitHub issues page or emailed to the package maintainer Brad Duthie (see email address in the package DESCRIPTION). This package has mainly been developed to facilitate ongoing modelling work at the University of Stirling, but it is open source and free to use. Comments and questions are welcome.
## Warning in par(oldpar): graphical parameter "cin" cannot be set
## Warning in par(oldpar): graphical parameter "cra" cannot be set
## Warning in par(oldpar): graphical parameter "csi" cannot be set
## Warning in par(oldpar): graphical parameter "cxy" cannot be set
## Warning in par(oldpar): graphical parameter "din" cannot be set
## Warning in par(oldpar): graphical parameter "page" cannot be set