mior: Fit MIOR model to the data

View source: R/mior.R

miorR Documentation

Fit MIOR model to the data

Description

This function fits the MIOR model, proposed by Xiao Y, Liu B, and Hao Z (2018) in "Multiple-instance Ordinal Regression". MIOR is a modified SVM framework with parallel, ordered hyperplanes where the error terms are based only on the instance closest to a midpoint between hyperplanes.

Usage

## Default S3 method:
mior(
  x,
  y,
  bags,
  cost = 1,
  cost_eta = 1,
  method = "qp-heuristic",
  weights = NULL,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    max_step = 500, scale = TRUE, verbose = FALSE, time_limit = 60, option =
    c("corrected", "xiao")),
  ...
)

## S3 method for class 'formula'
mior(formula, data, ...)

## S3 method for class 'mi_df'
mior(x, ...)

Arguments

x

A data.frame, matrix, or similar object of covariates, where each row represents an instance. If a mi_df object is passed, y, bags are automatically extracted, and all other columns will be used as predictors.

y

A numeric, character, or factor vector of bag labels for each instance. Must satisfy length(y) == nrow(x). Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.

bags

A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.

cost

The cost parameter in SVM. If method = 'heuristic', this will be fed to kernlab::ksvm(), otherwise it is similarly in internal functions.

cost_eta

The additional cost parameter in MIOR which controls how far away the first and last separating hyperplanes are relative to other costs.

method

The algorithm to use in fitting (default 'heuristic'). When method = 'heuristic', which employs an algorithm similar to Andrews et al. (2003). When method = 'mip', the novel MIP method will be used. When method = 'qp-heuristic, the heuristic algorithm is computed using the dual SVM. See details.

weights

named vector, or TRUE, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If TRUE, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of y.

control

list of additional parameters passed to the method that control computation with the following components:

  • kernel either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level.

  • sigma argument needed for radial basis kernel.

  • max_step argument used when method = 'heuristic'. Maximum steps of iteration for the heuristic algorithm.

  • scale argument used for all methods. A logical for whether to rescale the input before fitting.

  • verbose argument used when method = 'mip'. Whether to message output to the console.

  • time_limit argument used when method = 'mip'. FALSE, or a time limit (in seconds) passed to gurobi() parameters. If FALSE, no time limit is given.

  • option argument the controls the constraint calculation. See details.

...

Arguments passed to or from other methods.

formula

a formula with specification mi(y, bags) ~ x which uses the mi function to create the bag-instance structure. This argument is an alternative to the x, y, bags arguments, but requires the data argument. See examples.

data

If formula is provided, a data.frame or similar from which formula elements will be extracted

Details

Predictions (see predict.mior()) are determined by considering the smallest distance from each point to the midpoint hyperplanes across all instances in the bag. The prediction corresponds to the hyperplane having such a minimal distance.

It appears as though an error in Equation (12) persists to the dual form in (21). A corrected version of this dual formulation can be used with control$option = 'corrected', or the formulation as written can be used with control$option = 'xiao'.

Value

An object of class mior The object contains at least the following components:

  • gurobi_fit: A fit from model optimization that includes relevant components.

  • call_type: A character indicating which method misvm() was called with.

  • features: The names of features used in training.

  • levels: The levels of y that are recorded for future prediction.

  • cost: The cost parameter from function inputs.

  • weights: The calculated weights on the cost parameter.

  • repr_inst: The instances from positive bags that are selected to be most representative of the positive instances.

  • n_step: If method %in% c('heuristic', 'qp-heuristic'), the total steps used in the heuristic algorithm.

  • x_scale: If scale = TRUE, the scaling parameters for new predictions.

Methods (by class)

  • default: Method for data.frame-like objects

  • formula: Method for passing formula

  • mi_df: Method for mi_df objects, automatically handling bag names, labels, and all covariates.

Author(s)

Sean Kent

References

Xiao, Y., Liu, B., & Hao, Z. (2017). Multiple-instance ordinal regression. IEEE Transactions on Neural Networks and Learning Systems, 29(9), 4398-4413. doi: 10.1109/TNNLS.2017.2766164

See Also

predict.misvm() for prediction on new data.

Examples

if (require(gurobi)) {
  set.seed(8)
  # make some data
  n <- 15
  X <- rbind(
    mvtnorm::rmvnorm(n/3, mean = c(4, -2, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(0, 0, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(-2, 1, 0))
  )
  score <- X %*% c(2, -1, 0)
  y <- as.numeric(cut(score, c(-Inf, quantile(score, probs = 1:2 / 3), Inf)))
  bags <- 1:length(y)

  # add in points outside boundaries
  X <- rbind(
    X,
    mvtnorm::rmvnorm(n, mean = c(6, -3, 0)),
    mvtnorm::rmvnorm(n, mean = c(-6, 3, 0))
  )
  y <- c(y, rep(-1, 2*n))
  bags <- rep(bags, 3)
  repr <- c(rep(1, n), rep(0, 2*n))

  y_bag <- classify_bags(y, bags, condense = FALSE)

  mdl1 <- mior(X, y_bag, bags)
  predict(mdl1, X, new_bags = bags)
}


mildsvm documentation built on July 14, 2022, 9:08 a.m.