iprior: Fit an I-prior regression model
In haziqjamil/iprior: Regression Modelling using I-Priors

iprior

R Documentation

Fit an I-prior regression model

Description

A function to perform regression using I-priors. The I-prior model parameters may be estimated in a number of ways: direct minimisation of the marginal deviance, EM algorithm, fixed hyperparameters, or using a Nystrom kernel approximation.

Usage

## Default S3 method:
iprior(
  y,
  ...,
  kernel = "linear",
  method = "direct",
  control = list(),
  interactions = NULL,
  est.lambda = TRUE,
  est.hurst = FALSE,
  est.lengthscale = FALSE,
  est.offset = FALSE,
  est.psi = TRUE,
  fixed.hyp = NULL,
  lambda = 1,
  psi = 1,
  nystrom = FALSE,
  nys.seed = NULL,
  model = list(),
  train.samp,
  test.samp,
  intercept
)

## S3 method for class 'formula'
iprior(
  formula,
  data,
  kernel = "linear",
  one.lam = FALSE,
  method = "direct",
  control = list(),
  est.lambda = TRUE,
  est.hurst = FALSE,
  est.lengthscale = FALSE,
  est.offset = FALSE,
  est.psi = TRUE,
  fixed.hyp = NULL,
  lambda = 1,
  psi = 1,
  nystrom = FALSE,
  nys.seed = NULL,
  model = list(),
  train.samp,
  test.samp,
  intercept,
  ...
)

## S3 method for class 'ipriorKernel'
iprior(object, method = "direct", control = list(), ...)

## S3 method for class 'ipriorMod'
iprior(object, method = NULL, control = list(), iter.update = 100, ...)

Arguments

`y`	Vector of response variables
`...`	Only used when fitting using non-formula, enter the variables (vectors or matrices) separated by commas.
`kernel`	Character vector indicating the type of kernel for the variables. Available choices are: `"linear"` - (default) for the linear kernel `"canonical"` - alternative name for `"linear"` `"fbm"`, `"fbm,0.5"` - for the fBm kernel with Hurst coefficient 0.5 (default) `"se"`, `"se,1"` - for the SE kernel with lengthscale 1 (default) `"poly"`, `"poly2"`, `"poly2,0"` - for the polynomial kernel of degree 2 with offset 0 (default) `"pearson" - for the Pearson kernel` The `kernel` argument can also be a vector of length equal to the number of variables, therefore it is possible to specify different kernels for each variables. Note that factor type variables are assigned the Pearson kernel by default, and that non-factor types can be forced to use the Pearson kernel (not recommended).
`method`	The estimation method. One of: `"direct"` - for the direct minimisation of the marginal deviance using `optim()`'s L-BFGS method `"em"` - for the EM algorithm `"mixed"` - combination of the direct and EM methods `"fixed"` - for just obtaining the posterior regression function with fixed hyperparameters (default method when setting `fixed.hyp = TRUE`) `"canonical"` - an efficient estimation method which takes advantage of the structure of the linear kernel
`control`	(Optional) A list of control options for the estimation procedure: `maxit` The maximum number of iterations for the quasi-Newton optimisation or the EM algorithm. Defaults to `100`. `em.maxit` For `method = "mixed"`, the number of EM steps before switching to direct optimisation. Defaults to `5`. `stop.crit` The stopping criterion for the EM and L-BFGS algorithm, which is the difference in successive log-likelihood values. Defaults to `1e-8`. `theta0` The initial values for the hyperparameters. Defaults to random starting values. `report` The interval of reporting for the `optim()` function. `restarts` The number of random restarts to perform. Defaults to `0`. It's also possible to set it to `TRUE`, in which case the number of random restarts is set to the total number of available cores. `no.cores` The number of cores in which to do random restarts. Defaults to the total number of available cores. `omega` The overrelaxation parameter for the EM algorithm - a value between 0 and 1.
`interactions`	Character vector to specify the interaction terms. When using formulas, this is specified automatically, so is not required. Syntax is `"a:b"` to indicate variable `a` interacts with variable `b`.
`est.lambda`	Logical. Estimate the scale parameters? Defaults to `TRUE`.
`est.hurst`	Logical. Estimate the Hurst coefficients for fBm kernels? Defaults to `FALSE`.
`est.lengthscale`	Logical. Estimate the lengthscales for SE kernels? Defaults to `FALSE`.
`est.offset`	Logical. Estimate the offsets for polynomial kernels? Defaults to `FALSE`.
`est.psi`	Logical. Estimate the error precision? Defaults to `TRUE`.
`fixed.hyp`	Logical. If `TRUE`, then no hyperparameters are estimated, i.e. all of the above `est.x` are set to `FALSE`, and vice versa. If `NULL` (default) then all of the `est.x` defaults are respected.
`lambda`	Initial/Default scale parameters. Relevant especially if `est.lambda = FALSE`.
`psi`	Initial/Default value for error precision. Relevant especially if `est.psi = FALSE`.
`nystrom`	Either logical or an integer indicating the number of Nystrom samples to take. Defaults to `FALSE`. If `TRUE`, then approximately 10% of the sample size is used for the Nystrom approximation.
`nys.seed`	The random seed for the Nystrom sampling. Defaults to `NULL`, which means the random seed is not fixed.
`model`	DEPRECATED.
`train.samp`	(Optional) A vector indicating which of the data points should be used for training, and the remaining used for testing.
`test.samp`	(Optional) Similar to `train.samp`, but on test samples instead.
`intercept`	Optional intercept term.
`formula`	The formula to fit when using formula interface.
`data`	Data frame containing variables when using formula interface.
`one.lam`	Logical. When using formula input, this is a convenient way of letting the function know to treat all variables as a single variable (i.e. shared scale parameter). Defaults to `FALSE`.
`object`	An `ipriorKernel` or `ipriorMod` object.
`iter.update`	The number of iterations to perform when calling the function on an `ipriorMod` object. Defaults to `100`.

Details

The iprior() function is able to take formula based input and non-formula. When not using formula, the syntax is as per the default S3 method. That is, the response variable is the vector y, and any explanatory variables should follow this, and separated by commas.

As described here, the model can be loaded first into an ipriorKernel object, and then passed to the iprior() function to perform the estimation.

Value

An ipriorMod object. Several accessor functions have been written to obtain pertinent things from the ipriorMod object. The print() and summary() methods display the relevant model information.

Methods (by class)

iprior(ipriorKernel): Takes in object of type ipriorKernel, a loaded and prepared I-prior model, and proceeds to estimate it.
iprior(ipriorMod): Re-run or continue running the EM algorithm from last attained parameter values in object ipriorMod.

Examples


# Formula based input
(mod.stackf <- iprior(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.,
                      data = stackloss))
mod.toothf <- iprior(len ~ supp * dose, data = ToothGrowth)
summary(mod.toothf)

# Non-formula based input
mod.stacknf <- iprior(y = stackloss$stack.loss,
                      Air.Flow = stackloss$Air.Flow,
                      Water.Temp = stackloss$Water.Temp,
                      Acid.Conc. = stackloss$Acid.Conc.)
mod.toothnf <- iprior(y = ToothGrowth$len, ToothGrowth$supp, ToothGrowth$dose,
                      interactions = "1:2")

# Formula based model option one.lam = TRUE
# Sets a single scale parameter for all variables
modf <- iprior(stack.loss ~ ., data = stackloss, one.lam = TRUE)
modnf <- iprior(y = stackloss$stack.loss, X = stackloss[1:3])
all.equal(coef(modnf), coef(modnf))  # both models are equivalent

# Fit models using different kernels
dat <- gen_smooth(n = 100)
mod <- iprior(y ~ X, dat, kernel = "fbm")  # Hurst = 0.5 (default)
mod <- iprior(y ~ X, dat, kernel = "poly3")  # polynomial degree 3

# Fit models using various estimation methods
mod1 <- iprior(y ~ X, dat)
mod2 <- iprior(y ~ X, dat, method = "em")
mod3 <- iprior(y ~ X, dat, method = "canonical")
mod4 <- iprior(y ~ X, dat, method = "mixed")
mod5 <- iprior(y ~ X, dat, method = "fixed", lambda = coef(mod1)[1],
               psi = coef(mod1)[2])
c(logLik(mod1), logLik(mod2), logLik(mod3), logLik(mod4),
  logLik(mod5))

## Not run: 

# For large data sets, it is worth trying the Nystrom method
mod <- iprior(y ~ X, gen_smooth(5000), kernel = "se", nystrom = 50,
              est.lengthscale = TRUE)  # a bit slow
plot_fitted(mod, ci = FALSE)

## End(Not run)

haziqjamil/iprior documentation built on April 2, 2024, 5:26 p.m.