knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The goal of outerbase
is to make the production of near-interpolators easy, stable, and scalable. It is based on a C++
backend and interfaced with R
via {Rcpp}
. Under the hood, it leverages unique, custom linear algebra using {RcppArmadillo}
(Armadillo) and omp
. The overall structure is designed to be interacted with in an object-oriented manner using Rcpp
modules.
There are ways to interact with outerbase
for those who are uncomfortable with (or just actively detest) object oriented programming.
To begin, load the package.
library(outerbase)
Note that if you built this package from source, make sure you use a compiler that can process omp
commands to access the entire speed benefits.
To understand how to get started with using the package, we predict using data from a function with an eight dimensional input commonly known as the Borehole function. We begin by generating 1000
points using a test function ?obtest_borehole8d
built into the package.
sampsize = 400 d = 8 x = matrix(runif(sampsize*d),ncol=d) #uniform samples y = obtest_borehole8d(x) + 0.5*rnorm(sampsize)
Our goal will be to design a predictor for y
given x
that is a near-interpolator.
The simplest way to interact with this package is using ?obfit
(fitting outerbase). The function requires x
, y
and two other objects.
The value of numb
is the number of basis functions you want to use. The choice of numb
is still under research, but generally you want it to be large as tolerable. Play around!
The underlying concepts of this approach come from Gaussian processes. Thus the core building block of predictors will be covariances functions. The choice of the covariances needed for obfit
is a list of strings corresponding to each column in x
. Type listcov()
to discover what covariance functions are are currently deployed.
listcov()
If you are curious about any of them, type, e.g. ?covf_mat25pow
.
Note obfit
has some checks in place to prevent serious damage. They are not foolproof.
obmodel = obfit(x, y, covnames=rep("elephant",8)) obmodel = obfit(x, y[1:200], covnames=rep("mat25pow",5)) obmodel = obfit(x[1:2,], y[1:2], covnames=rep("mat25pow",8)) obmodel = obfit(x, y, numb = 2, covnames=rep("mat25pow",8)) obmodel = obfit(100*x, y, covnames=rep("mat25pow",8)) obmodel = obfit(0.001*x, y, covnames=rep("mat25pow",8))
Below is one correct deployment, where mat25pow
is used for all dimensions. This should take a bit to run, but it should be around a second on most modern computers.
ptm = proc.time() obmodel = obfit(x, y, numb=300, covnames=rep("mat25pow",8), verbose = 3) print((proc.time() - ptm)[3])
Note that the package is made using custom parallelization at the linear-algebra level. The package relies on omp
for parallelization, so if the package was not compiled with that in place there will be no benefits. The default call of obfit
grabs all available threads, which is ideal for desktops/laptops. It might be less ideal for large clusters where the CPU might be shared.
We can adjust the number of threads manually. Below we reduce ourselves to a single thread, which should slow things down.
ptm = proc.time() obmodel = obfit(x, y, numb=300, covnames=rep("mat25pow",8), nthreads=1) #optional input print((proc.time() - ptm)[3])
We can then predict using ?obpred
. While it is not an exact interpolator, it is close.
predtr = obpred(obmodel, x) rmsetr = sqrt(mean((y-predtr$mean)^2)) plot(predtr$mean, y, main=paste("training \n RMSE = ", round(rmsetr,3)), xlab="prediction", ylab = "actual")
Since we generated this data, we can show that outerbase
can reasonably predict ground truth, meaning overfitting is not an issue.
ytrue = obtest_borehole8d(x) rmsetr = sqrt(mean((ytrue-predtr$mean)^2)) plot(predtr$mean, ytrue, main=paste("oracle \n RMSE = ", round(rmsetr,3)), xlab="prediction", ylab="actual")
1000
test points generated the same way as our original data can also serve as a verification process.
xtest = matrix(runif(1000*d),ncol=d) #prediction points ytest = obtest_borehole8d(xtest) + 0.5*rnorm(1000)
The predictions at these new points are also quite good. Not quite as good as the residuals on the test set, but we are extrapolating here.
predtest = obpred(obmodel, xtest) rmsetst = sqrt(mean((ytest-predtest$mean)^2)) plot(predtest$mean, ytest, main=paste("testing \n RMSE = ", round(rmsetst,3)), xlab="prediction", ylab="actual")
This package also produces variances on the predictions which we can use to test reasonableness. The fact that the second histogram looks like a standard Normal is promising that the predictions are reasonable.
hist((ytest-predtest$mean), main="testing \n residuals", xlab="residuals") hist((ytest-predtest$mean)/sqrt(predtest$var), main="testing \n standarized residuals", xlab="standarized residuals")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.