test_arguments: Test (multiple) arguments of a prediction algorithm
In testarguments: Test (Multiple) Arguments of a User-Defined Prediction Algorithm

Description Usage Arguments Details Value See Also Examples

View source: R/test_arguments.R

Test the performance of a prediction algorithm over a range of argument values. Multiple arguments can be tested simultaneously.

1	test_arguments(pred_fun, df_train, df_test, diagnostic_fun, arguments)

`pred_fun`	The prediction algorithm to be tested. It should be a function with formal arguments `df_train` and `df_test`, which are data used to train the model and test out-of-sample predictive performance, respectively, as well as any arguments which are to be tested. The value of `pred_fun` should be a matrix-like object with named columns and the same number of rows as `df_test`
`df_train`	training data
`df_test`	testing data
`diagnostic_fun`	the criteria with which the predictive performance will be assessed
`arguments`	named list of arguments and their values to check

For each combination of the supplied argument levels, the value of pred_fun() is combined with df_test using cbind(), which is then passed into diagnostic_fun() to compute the diagnostics. Since the number of columns in the returned value of pred_fun() is arbitrary, one can test both predictions and uncertainty quantification of the predictions (e.g., by including prediction standard errors or predictive interval bounds)

an object of class 'testargs' containing all information from the testing procedure

plot_diagnostics, optimal_arguments

library("testarguments")

## Simulate training and testing data
RNGversion("3.6.0"); set.seed(1)
n  <- 1000                                          # sample size
x  <- seq(-1, 1, length.out = n)                    # covariates
mu <- exp(3 + 2 * x * (x - 1) * (x + 1) * (x - 2))  # polynomial function in x
Z  <- rpois(n, mu)                                  # simulate data
df       <- data.frame(x = x, Z = Z, mu = mu)
train_id <- sample(1:n, n/2, replace = FALSE)
df_train <- df[train_id, ]
df_test  <- df[-train_id, ]

## Algorithm that uses df_train to predict over df_test. We use glm(), and
## test the degree of the regression polynomial and the link function.
pred_fun <- function(df_train, df_test, degree, link) {

  M <- glm(Z ~ poly(x, degree), data = df_train,
           family = poisson(link = as.character(link)))

  ## Predict over df_test
  pred <- as.data.frame(predict(M, df_test, type = "link", se.fit = TRUE))

  ## Compute response level predictions and 90% prediction interval
  inv_link <- family(M)$linkinv
  fit_Y <- pred$fit
  se_Y  <- pred$se.fit
  pred <- data.frame(fit_Z = inv_link(fit_Y),
                     upr_Z = inv_link(fit_Y + 1.645 * se_Y),
                     lwr_Z = inv_link(fit_Y - 1.645 * se_Y))

  return(pred)
}

## Define diagnostic function. Should return a named vector
diagnostic_fun <- function(df) {
  with(df, c(
    RMSE = sqrt(mean((Z - fit_Z)^2)),
    MAE = mean(abs(Z - fit_Z)),
    coverage = mean(lwr_Z < mu & mu < upr_Z)
  ))
}

## Compute the user-defined diagnostics over a range of argument levels
testargs_object <- test_arguments(
  pred_fun, df_train, df_test, diagnostic_fun,
  arguments = list(degree = 1:6, link = c("log", "sqrt"))
)

## Visualise the performance across all combinations of the supplied arguments
plot_diagnostics(testargs_object)

## Focus on a subset of the tested arguments
plot_diagnostics(testargs_object, focused_args = "degree")

## Compute the optimal arguments for each diagnostic
optimal_arguments(
  testargs_object,
  optimality_criterion = list(coverage = function(x) which.min(abs(x - 0.90)))
)