%% note VignetteEngine line – other forms gave “no prebuilt vignette index” msg
optimr
and optimrx
are wrapper R packages to allow the regular R optim()
structure to be applied when using many different optimization
packages available to R users. In particular, we want
optim()
call;control$parscale
;optim()
returns;opm()
function (as such, optimr
replaces the optimx
package);optextras
package;The package optimr
is intended to be available as an official CRAN
package, so has a limited set of underlying optimizers. optimrx
will be
available on alternative package repositories (initially R-forge, in the
optimizer project), and will allow the use of many optimizers. Since
some of these optimizers may become deprecated or otherwise unavailable,
or else not be part of the official R repositories, I expect that the
optimrx
package will, from time to time, suffer glitches and failures.
It is fully intended that users may wish to extend the packages, especially
their optimr()
functions. optimr()
and ctrldefault
are the only places
where the R functions in the packages differ, with ctrldefault.R
listing
the optimizers available in each package and optimr.R
having relevant
code to match. Note that there are multiple
package lists in the ctrldefault()
function for unconstrained, bounded and masked parameters.
There are particularizations in the test scripts, but these
are restricted to the library()
statements.
Package optimrx
may also be expected to contain extra tests,
commentary, and other material of a developmental or experimental nature.
Nevertheless, I believe users may find this larger collection more useful if
they are having to attack difficult problems.
optimr is a package intended to provide improved and extended
function minimization tools for R. Such facilities are
commonly referred to as “optimization”, but the original optim()
function and its replacement in this package, which has the
same name as the package, namely optimr()
,
only allow for the minimization or maximization of nonlinear functions of
multiple parameters subject to at most bounds constraints. Many methods
offer extra facilities, but apart from masks (fixed parameters) for the
hjn
, Rcgmin
and Rvmmin
methods, such features are
likely inaccessible via this package without some customization of the code.
In general, we wish to find the
vector of parameters bestpar
that minimize an objective
function specified by an
R function fn(par, ... )
where par
is the general
vector of parameters, initially provided as a vector that is either the
first argument to optimr()
else specified by a par=
argument, and
the dot arguments are additional information needed to compute the function.
Function minimization methods may require information on the gradient or
Hessian of the function, which we will assume to be furnished, if required,
by functions gr(par, ...)
and hess(par, ....)
. Bounds or box constraints,
if they are to be imposed, are given in the vectors lower
and upper
.
As far as I am aware, all the optimizers included in the optimr
package are
local minimizers. That is, they attempt to find a local minimum of the
objective function. Global optimization is a much bigger problem. Even finding
a local minimum can often be difficult, and to that end, the package uses the
function kktchk()
from package optextras
to test the
Kuhn-Karush-Tucker conditions. These essentially
require that a local minumum has a zero gradient and all nearby points on the
function surface have a greater function value. There are many details that
are ignored in this very brief explanation.
It is intended that optimr
can and should be extended with new optimizers and
perhaps with extra functionality. Moreover, such extensions should be possible for
an R user reasonably proficient in programming R scripts. These changes would, of
course, be made by downloading the source of the package and modifying the R code to
make a new, but only locally available, package. The author does, on the other hand,
welcome suggestions for inclusion in the distributed package, especially if these
have been well-documented and tested. I caution, however, that over 95% of the
effort in building optimr
has been to try to ensure that errors and conflicts are purged, and that the code is
resistant to mistakes in user inputs.
##How the optimr() function (generally) works
optimr()
is an aggregation of wrappers for a number of individual function
minimization (“optimization”) tools available for R. The individual
wrappers are selected by a sequence of if()
statements using the argument
method
in the call to optimr()
.
To add a new optimizer, we need in general terms to carry out the following:
import
ed into optimr
;ctrldefault()
function;if()
statement to select the new “method”;control
list elements of optimr()
into the corresponding control arguments (possibly not in a list of that name
but in one or more other structures, or even arguments or environment variables)
for the new “method”;optimr()
for the nlm()
function, for example);control$parscale
to be applied
if this funtionality is not present in the methods. To my knowledge, only the
base optim()
function methods do not need such special scaling provisions.control
items.A number of methods support the inclusion of box (or bounds) constraints. This
includes the function nmkb()
from package dfoptim
. Unfortunately, this method
uses the transfinite transformation of the objective function to impose the bounds
(Chapter 11 of Nash, 2014), which causes an error if any of the initial parameters
are on one of the bounds.
There are several improvements in the optimr
and optimrx
packages relating to
bounds that would be especially nice to
see, but I do not have any good ideas yet how to implement them all. Among these
unresolved improvements are:
The methods hjn
, Rcgmin
and Rvmmin
(and possibly others, but not obviously accessible
via this package) also permit fixed (masked) parameters. This is useful when we want
to establish an objective function where one or more of the parameters is supplied
a value in most situations, or for which we want to fix a value while we optimize
the other parameters. At another time, we may want to allow such parameters to
be part of the optimization process.
In principle, we could fix parameters in methods that allow bounds constraints by
simply setting the lower and upper bounds equal for the parameters to be masked.
As a computational approach, this is generally a very bad idea, but in the present
optimr()
this is permitted as a way to signal that a parameter is fixed.
The method hjn
is a Hooke and Jeeves axial search method that allows masks and bounds.
It is coded using explicit loops, so will generally be much slower than an implementation
(e.g., hjkb
from dfoptim
or the similar code in package pracma
) that try to
employ vectorized computations. hjn
was included in optimr
and optimrx
to provide
an example of a direct search method with masks. I do NOT recommend it for general use.
There is a possibility that masks could be implemented globally in optimr.
mskd <- which(lower >= upper)
idx <- 1:length(par)
, then idx <- idx[which(lower<=upper)] are parameters that
take part in the optimizationidx
can be passed into the working function and gradient routines efn
and egr
in the same way scaling is performed.At the time of writing (2016-7-11) this has yet to be tried. However, we have got a test of the transformation.
## test masked transformation
# set of 10 parameters
par <- 3*(1:10)
cat("par:")
## par:
print(par)
## [1] 3 6 9 12 15 18 21 24 27 30
cat("Mask 3rd, 5th, 7th\n")
## Mask 3rd, 5th, 7th
bdmsk<-rep(1,10) # indicator of parameters that are free
bdmsk[3] <- 0
bdmsk[5] <- 0
bdmsk[7] <- 0
cat("bdmsk:")
## bdmsk:
print(bdmsk)
## [1] 1 1 0 1 0 1 0 1 1 1
# want to produce xpar which are the reduced parameters
iactive <- which(bdmsk == 1)
cat("iactive (length=",length(iactive),"):")
## iactive (length= 7 ):
print(iactive)
## [1] 1 2 4 6 8 9 10
xpar <- par[iactive]
cat("xpar:")
## xpar:
print(xpar)
## [1] 3 6 12 18 24 27 30
xpar <- - xpar
print("altered xpar:")
## [1] "altered xpar:"
print(xpar)
## [1] -3 -6 -12 -18 -24 -27 -30
cat("expand back to newpar\n")
## expand back to newpar
newpar <- par
newpar[iactive] <- xpar
cat("newpar:")
## newpar:
print(newpar)
## [1] -3 -6 9 -12 15 -18 21 -24 -27 -30
# Need to combine with scaling to get full setup for optimr()
# Then also think of the transfinite approach for bounds on unconstrained
## Or even for bounds methods but using unconstrained part.
The method nlm()
provides a good example of a situation where the default
fn()
and gr()
are inappropriate to the method to be added to optimr()
.
We need a function that returns not only the function value at the parameters
but also the gradient and possibly the hessian. Don't forget the dot arguments
which are the exogenous data for the function!
nlmfn <- function(spar, ...){
f <- efn(spar, ...)
g <- egr(spar, ...)
attr(f,"gradient") <- g
attr(f,"hessian") <- NULL # ?? maybe change later
f
}
Note that we have defined nlmfn
using the scaled parameters spar
and the
scaled function efn
and gradient egr
. That is, we develop the unified
objective plus gradient AFTER the parameters are scaled.
In the present optimr()
, the definition of nlmfn
is put near the top of
optimr()
and it is always loaded. It is the author's understanding that such
functions will always be loaded/interpreted no matter where they are in the
code of a function. For ease of finding the code, and as a former Pascal programmer,
I have put it near the top, as
the structure can be then shared across several similar optimizers. There are
other methods that compute the objective function and gradient at the same set of
parameters. Though nlm()
can make use of Hessian information, we have chosen
here to omit the computation of the Hessian.
Parameter scaling is a feature of the original optim()
but generally not provided in
many other optimizers. It has been included (at times with some difficulty) in the
optimr()
function. The construct is to provide a vector of scaling factors via the
control
list in the element parscale
.
In the tests of the package, and as an example of the use and utility of scaling, we use the Hobbs weed infestation problem (./tests/hobbs15b.R). This is a nonlinear least squares problem to estimate a three-parameter logistic function using data for 12 periods. This problem has a solution near the parameters c(196, 49, 0.3). In the test, we try starting from c(300, 50, 0.3) and from the much less informed c(1,1,1). In both cases, the scaling lets us find the solution more reliably. The timings and number of function and gradient evaluations are, however, not necessarily improved for the methods that “work” (though these measures are all somewhat unreliable because they may be defined or evaluated differently in different methods – we use the information returned by the packages rather than insert counters into functions). However, what values of these measures should we apply for a failed method?
As a warning – having made the mistake myself – scaling must be applied to bounds when calling a bounds-capable method.
optim()
uses control$fnscale
to “scale” the value of the function or gradient computed by
fn
or gr
respectively. In practice, the only use for this scaling is to convert a
maximization to a minimization. Most of the methods applied are function minimization tools,
so that if we want to maximize a function, we minimize its negative. Some methods
actually have
the possibility of maximization, and include a maximize
control. In these cases having both
fnscale
and maximize
could create a conflict. We check for this in optimr()
and try to
ensure both controls are set consistently.
Because different methods use different control parameters, and may even put them into
arguments rather than the control
list, a lot of the code in optimr()
is purely for
translating or transforming the names and values to achieve the desired result. This is
sometimes not possible precisely. A method which uses control$trace = TRUE
(a logical
element) has only “on” or “off” for controlling output. Other methods use an integer
for this trace
object, or call it something else that is an integer, in which case there
are more levels of output possible.
I have found that it is important to remove (i.e., set NULL) controls that are not used for a method. Moreover, since R can leave objects in the workspace, I find it important to set any unused or unwanted control to NULL both before and after calling a method.
Thus, if print.level
is the desired control, and it more or less matches the optimr()
control$trace
, we need to set
print.level <- control$trace
control$trace <- NULL
After the method has run, we may need to reset control$trace
.
When the method we wish to call is not written in R, special care is generally needed
to get a reliable and consistent operation. Typically we call an R routine from optimr()
.
Let us call this routine myop()
. then myop()
will set up and call the underlying
optimizer.
For FORTRAN programs, Nash (2014), Chapter 18, has some suggestions. Particular issues
concern the dot arguments (ellipsis or … entries to allow exogenous data for the
objective function and gradient), which can raise difficulties. In package optimrx
the interface to the lbfgs
shows one approach, which consolidates the arguments for
lbfgs()
into a list, then converts the list to an environment.
It is often convenient to be able to run multiple
optimization methods on the same function and
gradient. To this end, the function opm()
is supplied. The output of this by
default includes
the KKT tests and other informatin in a data frame, and there are convenience methods summary()
and coef()
to allow for display or extraction of results.
opm()
is extremely useful for comparing methods easily. I caution that it is not an efficient
way to run problems, even though it can be extremely helpful in deciding which method to apply
to a class of problems.
An important use of opm()
is to discover cases where methods fail on particular problems and
initial conditions of parameters and settings. This has proven over time to help discover
weaknesses and bugs in codes for different methods. If you find that such cases, and your
code and data can be rendered as an easily executed example, I strongly recommend posting it
to one of the R lists or communicating with the package maintainers. That really is one of the
few ways that our codes come to be improved.
Function polyopt()
is intended to allow for attempts to optimize a function by running different
methods in sequence. The call to polyopt() differs from that of optimr()
or opm()
in the following respects:
method
character argument or character vector is replaced by the methcontrol
array which has a
set of triplets consisting of a method name (character), a function evaluation count and an iteration count.control$maxit
and control$maxfeval
are replaced, if present, with values from the methcontrol
argument list.The methods in methcontrol
are executed in the sequence in which they appear. Each method runs until either
the specified number of iterations (typically gradient evaluations) or function evaluations have been completed,
or termination tests cause the method to be exited,
after which the best set of parameters so far is passed to the next method specified. If there is no further
method, polyopt()
exits.
Polyalgorithms may be useful because some methods such as Nelder-Mead are fairly robust and efficient in finding the region in which a minimum exists, but then very slow to obtain an accurate set of parameters. Gradients at points far from a solution may be such that gradient-based methods do poorly when started far away from a solution, but are very efficient when started “nearby”. Caution, however, is recommended. Such approaches need to be tested for particular applications.
For problems with multiple minima, or which are otherwise difficult to solve, it is sometimes
helpful to attempt an optimization from several starting points. multistart()
is a simple wrapper
to allow this to be carried out. Instead of the vector par
for the starting parameters argument,
however, we now have a matrix parmat
, each row of which is a set of starting parameters.
In setting up this functionality, I chose NOT to allow mixing of multiple starts with a
polyalgorithm or multiple methods. For users really wishing to do this, I believe the
available source codes opm.R
, polyopt.R
and multistart.R
provide a sufficient base
that the required tools can be fashioned fairly easily.
Different methods take different approaches to counting the computational effort of performing optimizations. Sometimes this can make it difficult to compare methods.
Derivative information is used by many optimization methods. In particular, the gradient is the vector of first derivatives of the objective function and the hessian is its second derivative. It is generally non-trivial to write a function for a gradient, and generally a lot of work to write the hessian function.
While there are derivative-free methods, we may also choose to employ numerical
approximations for derivatives. Some of the optimizers called by optimr
automatically
provide a numerical approximation if the gradient function (typically called gr
) is
not provided or set NULL. However, I believe this is open to abuse and also a source
of confusion, since we may not be informed in the results what approximation has been
used.
For example, the package numDeriv
has functions for the
gradient and hessian, and offers three methods, namely a Richardson extrapolation (the default),
a complex step method, and a “simple” method. The last is either a forward of backward approximation
controlled by an extra argument side
to the grad()
function. The complex step method, which offers
essentially analytic precision from a very efficient computation, is unfortunately only applicable if
the objective function has specific properties. That is, according to the documentation:
This method requires that the function be able to handle complex valued
arguments and return the appropriate complex valued result, even though
the user may only be interested in the real-valued derivatives. It also
requires that the complex function be analytic.
The default method of numDeriv
generally requires multiple evaluations of the objective
function to approximate a derivative. The simpler choices, namely, the forward,
backward and central approximations, require respectively 1, 1, and 2 (extra) function evaluations
for each parameter.
To keep the code straightforward, I decided that if an approximate gradient is to be used by
optimr()
(or by extension the multiple method routine opm()
), then the user should specify
the name of the approximation routine as a character string in quotations marks. The package
supplies four gradient approximation functions for this purpose, namely “grfwd” and “grback”
for the forward and backward simple approximations, “grcentral” for the central approximation,
and “grnd” for the default Richardson method via numDeriv
. It should be fairly straightforward
for a user to copy the structure of any of these routines and call their own gradient, but at
the time of writing this (2016-6-29) I have not tried to do so. An example using the complex
step derivative would also be useful to include in this vignette. I welcome contributions!
optimr()
in the packageAs mentioned above, this routine allows a vector of methods to be applied to a given function and (optionally) gradient. The pseudo-method “ALL” (upper case) can be given on its own (not in a vector) to run all available methods. If bounds are given, “ALL” restricts the set to the methods that can deal with bounds.
This routine is an attempt to consolidate the function, gradient and scale checks.
This routine provides default values for the control
vector that is applicable to all the methods
for a given size of problem. The single argument to this function is the number of parameters,
which is used to compute values for termination tolerances and limits on function and gradient
evaluations. However, while I believe the values computed are “reasonable” in general, for
specific problems they may be wildly inappropriate.
optextras
packageOptimization methods share a lot of common infrastructure, and much of this has been
collected in my optextras
package. The routines used in the current packages are as follows.
This routine, which can be called independently for checking the results of other optimization tools, checks the KKT conditions for a given set of parameters that are thought to describe a local optimum.
These have be discussed above under Derivatives.
These functions are provided to allow for detection of user errors in supplied function, gradient or hessian functions. Though we do not yet use hessians in the optimizers called, it is hoped that eventually they can be incorporated.
fnchk()
is mainly a tool to ensure that the supplied function returns a finite scalar value when
evaluated at the supplied parameters.
The other routines use numerical approximations (from numDeriv
) to check the derivative functions
supplied by the user.
This routine is intended to trap errors in setting up bounds and masks for function minimization problems. In particular, we are looking for situations where parameters are outside the bounds or where bounds are impossible to satisfy (e.g., lower > upper). This routine creates an indicator vector called bdmsk whose values are 1 for free parameters, 0 for masked (fixed) parameters, -3 for parameters at their lower bound and -1 for those at their upper bound. (The particular values are related to a coding trick for BASIC in the early 1980s.)
This routine is an attempt to check if the parameters and bounds are roughly similar in their scale. Unequal scaling can result in poor outcomes when trying to optimize functions using derivative free methods that try to search the paramter space. Note that the attempt to include parameter scaling for all methods is intended to provide a work-around for such bad scaling.
One of the unfortunate, and only partially avoidable, inefficiencies in a wrapper function such
as optimr()
is that there will be duplication of much of the setup and error-avoidance that
a properly constructed optimization program requires. That is, both the wrapper and the called
programs will have code to accomplish similar goals. Some of these relate to the following.
Besides these sources of inefficiency, there is a potential cost in both human effort and
program execution if we “specialize” variants of a code. For example, there can be separate
unconstrained and constrained routines, and the wrapper should call the appropriate version.
Rvmmin
has a top-level routine to decide between Rvmminu
and Rvmminb
, but optimr()
takes over this selection. A similar choice exists within dfoptim
for the Hooke and Jeeves
codes. While previously, I would have chosen to separate the bounded and unconstrained routines,
I am now leaning towards a combined routine for the Hooke and Jeeves. First, I discovered that
the separation seems to have introduced a bug, since the code was structured to allow a
similar organization for both choices, where possibly a different structure would have been
better adapted for efficient R. Note, however, that I have not performed appropriate timings
to support this conjecture. Second, I managed to implement a bounds constrained HJ code from
a description in one of my own books in less time than it took to try (not fully successfully)
to correct the code from dfoptim
, in part because the development and stable versions of the
latter are quite different, though both failed the test function bt.f()
example. Ravi Varadhan
has corrected the codes on CRAN.
Note that this is not a criticism of the creators of dfoptim
. I have made similar choices
myself with other packages.
And it is challenging to balance clarity, maintainability, efficiency, and common
structure for a suite of related program codes.
There are examples and tests within the packages. These include
At some future opportunity, I hope to be able to document these tests more fully.
Nash, John C. (2014) Nonlinear Parameter Optimization Using R Tools, Wiley: Chichester.
Nash, John C. and Walker-Smith, Mary (1987) Nonlinear Parameter Estimation: An Integrated System in BASIC", New York: Dekker, now available online with software at https://archive.org/details/NLPE87plus.