The rust package implements the multivariate generalized ratio-of-uniforms method of simulating random variates from a \(d\)-dimensional continuous distribution. The user specifies (the log of) a positive target function \(f(x)\) proportional to the density function of the distribution. For an introduction to rust see the vignette Introducing rust.
This vignette describes a new feature of rust: the option for the user to provide a C++ function to evaluate the target log-density, rather than an R function. The Rcpp Eddelbuettel (2013) and RcppArmadillo (Eddelbuettel and Sanderson 2014) packages are used to speed up simulation from the target density. The improvement results from faster function evaluations and (in particular) from performing using C++ the looping in the ratio-of-uniforms algorithm. The new function ru_rcpp
requires the target log-density to be specified using (externals pointers to) C++ functions, whereas the existing ru
requires input R functions. Otherwise, the functionality of these two functions is the same. There are also Rcpp-based versions of functions for setting Box-Cox transformation parameters: find_lambda_rcpp
and find_lambda_one_d_rcpp
In this vignette we describe in general terms the general setup of the Rcpp-based functions and use examples to illustrate their use. For more information about these examples see the vignette Introducing rust
ru_rcpp
The general way that rust enables users to provide their own C++ functions uses external pointers and is based on the Rcpp Gallery article Passing user-supplied C++ functions by Dirk Eddelbuettel. For a detailed case study of the general approach see the RcppDE package (Eddelbuettel 2016) vignette at the RcppDE page on CRAN.
The user writes a C++ function to calculate \(\log f(x)\). The current implementation in rust requires this function to have a particular structure: it must take a constant reference to an Rcpp::NumericVector
, say x
, a constant reference to an Rcpp::List
, say pars
, and return a double
precision scalar. x
is the argument \(x\) of \(f(x)\). pars
is a list containing the values of parameters whose values are not specified inside the function. This allows the user to change the values of any parameters in the target density without editing the function. If there are no such parameters then the user must still include the argument pars
in their function, even though the list provided to the function when it is called will be empty.
A simple way for the user to provide their C++ functions to create them in a file, say user_fns.cpp
. Example content is provided below. The full file is available on the rust Github page. The functions in this file are compiled and made available to R, either using the Rcpp::sourceCpp
function (e.g. Rcpp::sourceCpp("user_fns.cpp")
) or using RStudio’s Source button on the editor toolbar. The example content below also includes the function create_xptr
, which creates an external pointer to a C++ function. See Passing user-supplied C++ functions. It is this external pointer that is passed to ru_rcpp
to perform ratio-of-uniforms sampling. If the user has written a C++ function, say new_name
, then they need to add to create_xptr
two lines of code:
else if (fstr == "new_name")
return(Rcpp::XPtr<funcPtr>(new funcPtr(&new_name))) ;
to create an external pointer for new_name
using create_xptr
. The following excerpt from the example user_fns.cpp
file contains code for a standard normal density. Note that for this particular example we don’t need RcppArmadillo: we could replace #include <RcppArmadillo.h>
with #include <Rcpp.h>
and delete using namespace arma;
. However, RcppArmadillo is used in the the multivariate normal example below and will be useful in many examples.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
using namespace Rcpp;
// [[Rcpp::interfaces(r, cpp)]]
// User-supplied C++ functions for logf.
// Note that currently the only interface available in rust is
// double fun(const Rcpp::NumericVector& x, const Rcpp::List& pars).
// However, as shown in the function logdmvnorm below RcppArmadillo
// functions can be used inside the function.
// Each function must be prefaced by the line: // [[Rcpp::export]]
// One-dimensional standard normal.
// [[Rcpp::export]]
double logdN01(const Rcpp::NumericVector& x, const Rcpp::List& pars) {
return (-pow(x[0], 2.0) / 2.0) ;
}
// A function to create external pointers for any of the functions above.
// See http://gallery.rcpp.org/articles/passing-cpp-function-pointers/
// If you write a new function above called new_name then add the following
//
// else if (fstr == "new_name")
// return(Rcpp::XPtr<funcPtr>(new funcPtr(&new_name))) ;
// [[Rcpp::export]]
SEXP create_xptr(std::string fstr) {
typedef double (*funcPtr)(const Rcpp::NumericVector& x,
const Rcpp::List& pars) ;
if (fstr == "logdN01")
return(Rcpp::XPtr<funcPtr>(new funcPtr(&logdN01))) ;
else
return(Rcpp::XPtr<funcPtr>(R_NilValue)) ;
}
// We could create the external pointers when this file is sourced using
// the embedded R code below and/or (re)create them using create_xptr() in
// an R session or R package..
/*** R
ptr_N01 <- create_xptr("logdN01")
*/
ru_rcpp
All the examples in the documentation for ru
are replicated in the documentation for ru_rcpp
. Here we consider a subset of the examples from the Introducing rust vignette, to illustrate how to provide user-supplied C++ functions to ru_rcpp
and to compare the performances of ru
and ru_rcpp
. We use the microbenchmark package (Mersmann 2015) to make the comparison.
library(rust)
library(Rcpp)
# Is microbenchmark available?
<- requireNamespace("microbenchmark", quietly = TRUE)
got_microbenchmark if (got_microbenchmark) {
library(microbenchmark)
} # Set the size of the simulated sample
<- 1000 n
It is assumed that the user has already compiled their C++ functions and made them available to their R session, either using the Rcpp::sourceCpp
function (e.g. Rcpp::sourceCpp("user_fns.cpp")
) or using RStudio’s Source button on the editor toolbar.
We start with a simple example: the (1-dimensional) standard normal density, based on the C++ function logdN01
in the example user_fns.cpp
file above.
# Normal density ===================
# Create a pointer to the logdN01 C++ function
# (not necessary if this was created when the file of C++ functions was sourced)
<- create_xptr("logdN01")
ptr_N01
# Use ru and ru_rcpp starting from the same random number seed and check
# that the simulated values are the same.
set.seed(47)
<- ru(logf = function(x) -x ^ 2 / 2, d = 1, n = n, init = 0.1)
x_old head(x_old$sim_vals)
#> [,1]
#> [1,] 0.7764728
#> [2,] 0.5310434
#> [3,] -0.1046049
#> [4,] 1.2111509
#> [5,] 1.1391379
#> [6,] 0.5180914
set.seed(47)
<- ru_rcpp(logf = ptr_N01, d = 1, n = n, init = 0.1)
x_new head(x_new$sim_vals)
#> [,1]
#> [1,] 0.7764728
#> [2,] 0.5310434
#> [3,] -0.1046049
#> [4,] 1.2111509
#> [5,] 1.1391379
#> [6,] 0.5180914
# Compare performances of ru and ru_rcpp
if (got_microbenchmark) {
<- microbenchmark(
res old = ru(logf = function(x) -x ^ 2 / 2, d = 1, n = n, init = 0.1),
new = ru_rcpp(logf = ptr_N01, d = 1, n = n, init = 0.1)
)print(res, signif = 4)
}#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> old 20.860 22.030 24.300 23.11 25.220 92.970 100
#> new 2.256 2.422 2.679 2.51 2.655 6.385 100
As we would hope, ru_rcpp
is faster than ru
. If we start from the same random number seed we get the same simulated values from ru
and ru_rcpp
.
To execute this example we add the following function to user_fns.cpp
// d-dimensional normal with zero-mean and covariance matrix sigma.
// [[Rcpp::export]]
double logdmvnorm(const Rcpp::NumericVector& x, const Rcpp::List& pars) {
arma::mat sigma = as<arma::mat>(pars["sigma"]) ;
arma::vec y = Rcpp::as<arma::vec>(x) ;
double qform = arma::as_scalar(y.t() * arma::inv(sigma) * y) ;
return -qform / 2.0 ;
}
and add
else if (fstr == "logdmvnorm")
return(Rcpp::XPtr<funcPtr>(new funcPtr(&logdmvnorm))) ;
to the function create_xptr
in user_fns.cpp
.
# Three-dimensional normal with positive association ----------------
<- 0.9
rho <- matrix(rho, 3, 3) + diag(1 - rho, 3)
covmat # R function
<- function(x, mean = rep(0, d), sigma = diag(d)) {
log_dmvnorm <- matrix(x, ncol = length(x))
x <- ncol(x)
d - 0.5 * (x - mean) %*% solve(sigma) %*% t(x - mean)
}# Create a pointer to the logdmvnorm C++ function
<- create_xptr("logdmvnorm")
ptr_mvn
if (got_microbenchmark) {
<- microbenchmark(
res old = ru(logf = log_dmvnorm, sigma = covmat, d = 3, n = n,
init = c(0, 0, 0)),
new = ru_rcpp(logf = ptr_mvn, sigma = covmat, d = 3, n = n,
init = c(0, 0, 0))
) print(res, signif = 4)
} #> Unit: milliseconds
#> expr min lq mean median uq max neval
#> old 204.600 213.70 219.2 218.000 224.30 284.20 100
#> new 8.593 9.04 9.6 9.274 9.61 16.16 100
Again, the improvement in speed obtained using Rcpp is clear.
In this example we use a log transform (Box-Cox parameter \(\lambda = 0\)) so that the ratio-of-uniforms sampling is based on a normal distribution. The C++ function to calculate the log-density of a lognormal distribution is:
// Lognormal(mu, sigma).
// [[Rcpp::export]]
double logdlnorm(const Rcpp::NumericVector& x, const Rcpp::List& pars) {
double mu = pars["mu"] ;
double sigma = pars["sigma"] ;
if (x[0] > 0)
return -log(x[0]) - pow(log(x[0]) - mu, 2.0) / (2.0 * pow(sigma, 2.0)) ;
else
return R_NegInf ;
}
<- create_xptr("logdlnorm")
ptr_lnorm if (got_microbenchmark) {
<- microbenchmark(
res old = ru(logf = dlnorm, log = TRUE, d = 1, n = n, lower = 0, init = 0.1,
trans = "BC", lambda = 0),
new = ru_rcpp(logf = ptr_lnorm, mu = 0, sigma = 1, d = 1, n = n,
lower = 0, init = 0.1, trans = "BC", lambda = 0)
)print(res, signif = 4)
} #> Unit: milliseconds
#> expr min lq mean median uq max neval
#> old 32.58 34.430 36.930 36.320 37.700 101.200 100
#> new 5.05 5.257 5.743 5.457 5.738 9.189 100
The C++ function to calculate the log-posterior density is:
// Generalized Pareto posterior based on an MDI prior truncated to
// shape parameter xi >= -1.
// [[Rcpp::export]]
double loggp(const Rcpp::NumericVector& x, const Rcpp::List& ss) {
Rcpp::NumericVector gpd_data = ss["gpd_data"] ;
int m = ss["m"] ;
double xm = ss["xm"] ;
double sum_gp = ss["sum_gp"] ;
if (x[0] <= 0 || x[1] <= -x[0] / xm)
return R_NegInf ;
double loglik ;
Rcpp::NumericVector sdat = gpd_data / x[0] ;
Rcpp::NumericVector zz = 1 + x[1] * sdat ;
if (std::abs(x[1]) > 1e-6) {
loglik = -m * log(x[0]) - (1.0 + 1.0 / x[1]) * sum(log(zz)) ;
} else {
double t1, t2, sdatj ;
double total = 0;
for(int j = 0; j < m; ++j) {
sdatj = sdat[j] ;
for(int i = 1; i < 5; ++i) {
t1 = pow(sdatj, i) ;
t2 = (i * sdatj - i - 1) ;
total += pow(-1.0, i) * t1 * t2 * pow(x[1], i) / i / (i + 1) ;
}
}
loglik = -m * log(x[0]) - sum_gp / x[0] - total ;
}
// MDI prior.
if (x[1] < -1)
return R_NegInf ;
double logprior = -log(x[0]) - x[1] - 1 ;
return (logprior + loglik) ;
}
We simulate some data from a Generalized Pareto distribution, calculate summary statistics involved in the likelihood and calculate an initial value in the search for the posterior mode.
set.seed(46)
# Sample data from a GP(sigma, xi) distribution
<- rgpd(m = 100, xi = -0.5, sigma = 1)
gpd_data # Calculate summary statistics for use in the log-likelihood
<- gpd_sum_stats(gpd_data)
ss # Calculate an initial estimate
<- c(mean(gpd_data), 0) init
Again we see that ru_rcpp
is substantially faster than ru
.
# Arguments for ru_rcpp
<- create_xptr("loggp")
ptr_gp <- c(list(logf = ptr_gp, init = init, d = 2, n = n,
for_ru_rcpp lower = c(0, -Inf)), ss)
if (got_microbenchmark) {
<- microbenchmark(
res old = ru(logf = gpd_logpost, ss = ss, d = 2, n = n, init = init,
lower = c(0, -Inf)),
new = do.call(ru_rcpp, for_ru_rcpp)
)print(res, signif = 4)
} #> Unit: milliseconds
#> expr min lq mean median uq max neval
#> old 87.97 93.78 97.00 96.56 99.28 131.80 100
#> new 17.97 18.67 19.87 19.15 20.23 26.11 100
find_lambda_one_d_rcpp
and find_lambda_rcpp
We repeat two examples from the Introducing rust vignette.
find_lambda_one_d_rcpp
We make use of the Rcpp sugar function dgamma
.
// Gamma(alpha, 1).
// [[Rcpp::export]]
double logdgamma(const Rcpp::NumericVector& x, const Rcpp::List& pars) {
double shp = pars["alpha"] ;
return Rcpp::dgamma(x, shp, 1.0, 1)[0] ;
}
<- 1
alpha <- qgamma(0.999, shape = alpha)
max_phi <- create_xptr("logdgamma")
ptr_gam <- find_lambda_one_d_rcpp(logf = ptr_gam, alpha = alpha,
lambda max_phi = max_phi)
lambda#> $lambda
#> [1] 0.2727968
#>
#> $gm
#> [1] 0.5689906
#>
#> $init_psi
#> [1] -0.2016904
#>
#> $sd_psi
#> [1] 0.7835109
#>
#> $user_args
#> list()
find_lambda_rcpp
In this example we supply an external pointer to a C++ function phi_to_theta
that ensures that both parameters of the model are strictly positive, a requirement for the Box-Cox transformation to be applicable. The function phi_to_theta
must have the same structure as the function used to calculate \(\log f\). See Providing a C++ function to ru_rcpp
for details. See the Introducing rust vignette for the form of the transformation.
<- do.call(gpd_init, ss)
temp <- pmax(0, temp$init_phi - 2 * temp$se_phi)
min_phi <- pmax(0, temp$init_phi + 2 * temp$se_phi)
max_phi
# Create external pointers
<- create_xptr("loggp")
ptr_gp <- create_phi_to_theta_xptr("gp")
ptr_phi_to_theta_gp # Note: log_j is set to zero by default inside find_lambda_rcpp()
<- find_lambda_rcpp(logf = ptr_gp, ss = ss, d = 2, min_phi = min_phi,
lambda max_phi = max_phi, user_args = list(xm = ss$xm),
phi_to_theta = ptr_phi_to_theta_gp)
lambda#> $lambda
#> [1] 0.1624226 0.3678549
#>
#> $gm
#> [1] 1.10542493 0.03225836
#>
#> $init_psi
#> [1] 0.1054021 -0.2184344
#>
#> $sd_psi
#> Var1 Var2
#> 0.12670792 0.02477219
#>
#> $phi_to_theta
#> <pointer: 0x000000000db1b100>
#>
#> $log_j
#> <pointer: 0x000000000db1af50>
#>
#> $user_args
#> $user_args$xm
#> [1] 1.846219