omisvm: Fit MI-SVM-OR model to ordinal outcome data

View source: R/omisvm.R

omisvmR Documentation

Fit MI-SVM-OR model to ordinal outcome data

Description

This function fits a modification of MI-SVM to ordinal outcome data based on the research method proposed by Kent and Yu.

Usage

## Default S3 method:
omisvm(
  x,
  y,
  bags,
  cost = 1,
  h = 1,
  s = Inf,
  method = c("qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    max_step = 500, type = "C-classification", scale = TRUE, verbose = FALSE, time_limit
    = 60),
  ...
)

## S3 method for class 'formula'
omisvm(formula, data, ...)

## S3 method for class 'mi_df'
omisvm(x, ...)

Arguments

x

A data.frame, matrix, or similar object of covariates, where each row represents an instance. If a mi_df object is passed, y, bags are automatically extracted, and all other columns will be used as predictors.

y

A numeric, character, or factor vector of bag labels for each instance. Must satisfy length(y) == nrow(x). Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.

bags

A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.

cost

The cost parameter in SVM. If method = 'heuristic', this will be fed to kernlab::ksvm(), otherwise it is similarly in internal functions.

h

A scalar that controls the trade-off between maximizing the margin and minimizing distance between hyperplanes.

s

An integer for how many replication points to add to the dataset. If k represents the number of labels in y, must have 1 <= s <= k-1. The default, Inf, uses the maximum number of replication points, k-1.

method

The algorithm to use in fitting (default 'heuristic'). When method = 'heuristic', which employs an algorithm similar to Andrews et al. (2003). When method = 'mip', the novel MIP method will be used. When method = 'qp-heuristic, the heuristic algorithm is computed using the dual SVM. See details.

weights

named vector, or TRUE, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If TRUE, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of y.

control

list of additional parameters passed to the method that control computation with the following components:

  • kernel either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level.

  • sigma argument needed for radial basis kernel.

  • nystrom_args a list of parameters to pass to kfm_nystrom(). This is used when method = 'mip' and kernel = 'radial' to generate a Nystrom approximation of the kernel features.

  • max_step argument used when method = 'heuristic'. Maximum steps of iteration for the heuristic algorithm.

  • type: argument used when method = 'heuristic'. The type argument is passed to e1071::svm().

  • scale argument used for all methods. A logical for whether to rescale the input before fitting.

  • verbose argument used when method = 'mip'. Whether to message output to the console.

  • time_limit argument used when method = 'mip'. FALSE, or a time limit (in seconds) passed to gurobi() parameters. If FALSE, no time limit is given.

  • start argument used when method = 'mip'. If TRUE, the mip program will be warm_started with the solution from method = 'qp-heuristic' to potentially improve speed.

...

Arguments passed to or from other methods.

formula

a formula with specification mi(y, bags) ~ x which uses the mi function to create the bag-instance structure. This argument is an alternative to the x, y, bags arguments, but requires the data argument. See examples.

data

If formula is provided, a data.frame or similar from which formula elements will be extracted

Details

Currently, the only method available is a heuristic algorithm in linear SVM space. Additional methods should be available shortly.

Value

An object of class omisvm. The object contains at least the following components:

  • *_fit: A fit object depending on the method parameter. If method = 'qp-heuristic' this will be gurobi_fit from a model optimization.

  • call_type: A character indicating which method omisvm() was called with.

  • features: The names of features used in training.

  • levels: The levels of y that are recorded for future prediction.

  • cost: The cost parameter from function inputs.

  • weights: The calculated weights on the cost parameter.

  • repr_inst: The instances from positive bags that are selected to be most representative of the positive instances.

  • n_step: If method == 'qp-heuristic', the total steps used in the heuristic algorithm.

  • x_scale: If scale = TRUE, the scaling parameters for new predictions.

Methods (by class)

  • default: Method for data.frame-like objects

  • formula: Method for passing formula

  • mi_df: Method for mi_df objects, automatically handling bag names, labels, and all covariates.

Author(s)

Sean Kent

See Also

predict.omisvm() for prediction on new data.

Examples

if (require(gurobi)) {
  data("ordmvnorm")
  x <- ordmvnorm[, 3:7]
  y <- ordmvnorm$bag_label
  bags <- ordmvnorm$bag_name

  mdl1 <- omisvm(x, y, bags, weights = NULL)
  predict(mdl1, x, new_bags = bags)
}


mildsvm documentation built on July 14, 2022, 9:08 a.m.