The package paropt is built in order to optimize parameters of ode-systems. The aim is to match the output of the in silico solution to previously measured states. The user has to supply an ode-system in the form of a Rcpp-function or as an external pointer to a C++ function. The information about states and parameters are passed on as data.frames. In this vignette a predator-prey system is used as example to demonstrate how the functions of ‘paropt’ can be used.
If using a Rcpp-function the following signature has to be fulfilled: ‘Rcpp::NumericVector ode(double t, std::vector
#include <Rcpp.h>
// [[Rcpp::export]]
::NumericVector ode_system(double t, std::vector<double> params,
Rcpp::NumericVector states) {
Rcpp
// set dimension of vector which should be returned
::NumericVector states_deriv(2); //two states in the example above (prey and predator)
Rcpp
// store parameters in variables
double a = params[0];
double b = params[1];
double c = params[2];
double d = params[3];
// store states in variables
double predator = states[0];
double prey = states[1];
// the actual ode-system
double ddtpredator = states_deriv[0] = predator*c*prey - predator*d;
double ddtprey = states_deriv[1] = prey*a - prey*b*predator;
return states_deriv;
}
/*** R
path <- system.file("examples", package = "paropt")
# states
df <- read.table(paste(path,"/states_LV.txt", sep = ""), header = TRUE)
# parameter
lb <- data.frame(time = 0, a = 0.8, b = 0.3, c = 0.09, d = 0.09)
ub <- data.frame(time = 0, a = 1.3, b = 0.7, c = 0.4, d = 0.7)
# Optimizing
library(paropt)
set.seed(1)
output <- optimizer(integration_times = df$time, ode_system = ode_system,
relative_tolerance = 1e-6, absolute_tolerances = c(1e-8, 1e-8),
lb = lb, ub = ub, states = df,
npop = 40, ngen = 1000, error = 0.0001, solvertype = "bdf")
output
# Plots
par(mfrow = c(2,1))
plot(df$time, output$States[,1], pch = 19, type = 'l', ylab = "predator", xlab = "time", ylim = c(0, 30))
points(df$time, states$n1, pch = 19, col = "darkred", type = 'p')
legend(80, 30, legend=c("in silico", "measured"),
col=c("black", "darkred"), lty=1, cex=0.8)
plot(df$time, output$States[,2], pch = 19, type = 'l', ylab = "prey", xlab = "time", ylim = c(0, 65))
points(df$time, states$n2, pch = 19, col = "darkred", type = 'p')
legend(80, 60, legend=c("in silico", "measured"),
col=c("black", "darkred"), lty=1, cex=0.8)
# Solving
optimized_parameter <- as.data.frame(output$Parameter)
names(optimized_parameter) <- c("time", "a", "b", "c", "d")
output_solver <- solve_ode_system(integration_times = df$time, ode_system = ode_system,
relative_tolerance = 1e-6, absolute_tolerances = c(1e-8, 1e-8),
start = optimized_parameter, states = df, solvertype = "bdf")
output_solver
*/
If using an external pointer to a C++ function, the following signature is mandatory: int ode(double &t, std::vector
It is crucial that the function is thread-safe!
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::depends(paropt)]]
// [[Rcpp::plugins(cpp11)]]
typedef int (*OS)(double &t, std::vector<double> ¶ms, std::vector<double> &states);
int ode_system(double &t, std::vector<double> ¶ms, std::vector<double> & states) {
// do not use any R-Code or R-Objects if the optimzation should run in parallel.
// Users have to guarantee that the function can be called by several threads in parallel
// define parameters (vector params contain the parameter in the order as defined in the corresponding textfiles)
double a = params[0];
double b = params[1];
double c = params[2];
double d = params[3];
// states have to be in the same order as specified in the textfile "states_LV.txt" (Otherwise the error-calculation does not work)
double n1 = states[0];
double n2 = states[1];
[0] = n1*c*n2 - n1*d;
states[1] = n2*a - n2*b*n1;
states
return 0;
}
// [[Rcpp::export]]
::XPtr<OS> test_optimization() {
Rcpp::XPtr<OS> xpfun = Rcpp::XPtr<OS>(new OS(&ode_system));
Rcpp
return xpfun;
}
/*** R
#states
path <- system.file("examples", package = "paropt")
states <- read.table(paste(path,"/states_LV.txt", sep = ""), header = T)
# parameter
lb <- data.frame(time = 0, a = 0.8, b = 0.3, c = 0.09, d = 0.09)
ub <- data.frame(time = 0, a = 1.3, b = 0.7, c = 0.4, d = 0.7)
# Optimizing
library(paropt)
set.seed(1)
df <- optimizer_pointer(integration_times = states$time, ode_sys = test_optimization(),
relative_tolerance = 1e-6, absolute_tolerances = c(1e-8, 1e-8),
lower = lb, upper = ub, states = states,
npop = 40, ngen = 1000, error = 0.0001, solvertype = "bdf")
df
# Plots
par(mfrow = c(2,1))
plot(states$time, df$States[,1], pch = 19, type = 'l', ylab = "predator", xlab = "time", ylim = c(0, 30))
points(states$time, states$n1, pch = 19, col = "darkred", type = 'p')
legend(80, 30, legend=c("in silico", "measured"),
col=c("black", "darkred"), lty=1, cex=0.8)
plot(states$time, df$States[,2], pch = 19, type = 'l', ylab = "prey", xlab = "time", ylim = c(0, 65))
points(states$time, states$n2, pch = 19, col = "darkred", type = 'p')
legend(80, 60, legend=c("in silico", "measured"),
col=c("black", "darkred"), lty=1, cex=0.8)
# Solving
optimized_parameter <- as.data.frame(df$Parameter)
names(optimized_parameter) <- c("time", "a", "b", "c", "d")
output_solver <- solve_ode_system_pointer(integration_times = states$time, fctptr = test_optimization(),
relative_tolerance = 1e-6, absolute_tolerances = c(1e-8, 1e-8),
start = optimized_parameter, states = states, solvertype = "bdf")
output_solver
*/
The information of the states has to be supplied as a data.frame. It is compulsory that the first column includes the name time. This column contains the independent variable across which the solver integrates (it does not have to be the time. In this document it is always called time). It is followed by the information of the states at the specific time points. Notably, the order of the states is the same as in the ode-system in order to correctly calculate the error. If a state is not available at all time points use ‘NA’ in order to ignore this state for error calculation at the specified time point. However, the first line below the header cannot contain ‘NA’ as it contains the start values for the ode-solver.
time | predator | prey |
---|---|---|
0 | 10.0000000 | 10.0000000 |
2 | 0.5446191 | 0.9717305 |
4 | 0.0595707 | 0.0508758 |
6 | 0.8024127 | 0.6833731 |
8 | 0.3180103 | 0.1617264 |
10 | 0.8571834 | 0.0382783 |
12 | 0.2734466 | NA |
14 | 0.9054123 | 0.0171890 |
16 | 0.7481677 | 0.3327260 |
18 | 0.3052091 | 0.9640896 |
20 | 0.6480681 | 0.8125060 |
22 | 0.3938566 | 0.0600567 |
24 | 0.6221504 | 0.8747358 |
Constant parameters do not change their value during the integration of the ode-system. The boundaries for these parameters have to be defined in the first row. Notably, the first column of the data.frames containing the lower or upper boundaries is the time column. The name ‘time’ is mandatory for this column. If a parameter is not constant, at least four different time-points are needed. The information for the parameter at the time-points are fed into an interpolation-function in order to calculate values at each time-point the ode-system is called. The interpolation is conducted in a wrapper-function around the actual ode-system. Thus, the parameters passed on to the ode-system are the previously splined values for the specific time-point. For instance if the ode-system is called at t = 3.5 the parameter ‘variable’ is not defined. In this case the parameters have to be interpolated. This is conducted using a Catmull-Rom-Spline. The parameter-vector passed to the ode-system already contains the splined parameters at timepoint t.
time | const | variable |
---|---|---|
0 | 1 | 1 |
2 | NA | 2 |
4 | NA | 3 |
6 | NA | 4 |
8 | NA | 5 |
10 | NA | 6 |
12 | NA | 7 |
14 | NA | 8 |
16 | NA | 9 |
18 | NA | 10 |
20 | NA | 11 |
22 | NA | 12 |
24 | NA | 13 |
time | const | variable |
---|---|---|
0 | 2 | 20 |
2 | NA | 20 |
4 | NA | 20 |
6 | NA | 20 |
8 | NA | 20 |
10 | NA | 20 |
12 | NA | 20 |
14 | NA | 20 |
16 | NA | 20 |
18 | NA | 20 |
20 | NA | 20 |
22 | NA | 20 |
24 | NA | 20 |
During the optimization the optimizer creates a bunch of possible solutions within the parameter boundaries. Each solution is passed to the ode-solver which integrates along the time and returns the states at the time-points specified in the data.frame containing the state-information. The in silico solution is compared to the measured states in order to evaluate the parameterset. The error used is the sum of the absolute differences between in silico and measured states, divided by the number of timepoints.
optimizer(integration_times, ode_system,
relative_tolerance, absolute_tolerances,
lb, ub, states, npop, ngen, error, solvertype)
The first argument is the time-vector containing either the same information as defined in the time-column defined in the data.frame containing the states (see Table1) or only a subset (it can only be shortened at the end).
It is mandatory to start with the first entry of the time-column, however it is possible to stop at a certain time-point before the last one. Thus, it is possible to optimize only a part of the problem.
The next argument is the compiled ode-system followed by the relative tolerance and the absolute tolerances that are used by the ode-solver. These are followed by the data.frames defining the lower and upper boundaries of the parameter. Next the data.frame containing the state information is passed to the function.
In order to optimize the parameters a particle swarm optimizer (PSO) is used. Therefore, the number of particles (npop = 40 is a good starting point for many problems) and the number of generations (ngen) have to be passed to the function. The number of generations defines the maximum number of generations the PSO should run. However, if the PSO finds a suitable parameter set which has an error below or equal a threshold defined by the user it stops. This threshold is defined by the error-argument.
The last argument defines the type of solver to be used. Four different solver exist: ‘bdf,’ ‘ADAMS,’ ‘ERK’ and ‘ARK. All solvers are part of the SUNDIALS project. For details see ’https://computing.llnl.gov/projects/sundials’ and Hindmarsh et al. (2005). The bdf solver is the backward differential formular solver of CVODE and is best suited for stiff problems. It uses a dense matrix module (SUNDense_Matrix) and the corresponding nonlinear solver (SUNLinsol_Dense). The ADAMS solver is the ADAMS-MOULTON solver of CVODE and is most suitable for non-stiff problems. The ‘ERK’ solver is an explicite Runge-Kutta solver which is like the ADAMS-MOULTON solver used for non-stiff-problems. The ‘ARK’ solver is a fully implicte Runge-Kutta-solver which uses the same matrix and nonlinear solver module as ‘bdf.’ The integration itself occurs in a for-loop using the ‘CV_NORMAL’ step-function for all four solvers. If integration for a specific parameter set is not possible the error is set to 1.79769e+308 (which is the maximum of a double). If you want to test a specific parameterset just call the function ode_solving. The function requires the same parameter as the optimizer. Naturally, the arguments for the PSO, the error-threshold as well as the parameter lower- and upper-bounds are not needed.
In principal, the optimizer_pointer and solve_ode_system_pointer do the same as the functions optimizer and solve_ode_system. However, they use an external pointer to a C++ function. Thus, they show a better performance. Notably, it was necessary to change the PSO function for optimizer_pointer in order to permit parallelisation. Therefore even with the same seed the results differ between optimizer and optimizer_pointer.
This PSO-implementation is a modified version of ‘https://github.com/kthohr/optim’ (for a general overview see Sengupta, Basak, and Peters (2018)). The PSO has several key features. First of all, a bunch (number of particles defined by user) of possible parameter sets is created within the boundaries defined by the user. Each parameter set is called a particle. All particles together are called the swarm. Each possible parameter set is passed to the solver which integrates the system. The result is used to calculate the error. Thus, each particle has a current solution and a current personal best value. Furthermore, each particle possesses a neighborhood which consists of 0-3 other particles (for details see Akman, Akman, and Schaefer (2018)). The PSO uses a combination of its history (personal best value) and the information of the best particle of the neighborhood to change its current value. Notably, in this package a randomly adaptive topology is used. This means, that the neighborhood is recalculated each time when the global best solution (best solution of the entire swarm) has not improved within one generation.