mismm: Fit MILD-SVM model to the data
In mildsvm: Multiple-Instance Learning with Support Vector Machines

mismm

R Documentation

Fit MILD-SVM model to the data

Description

This function fits the MILD-SVM model, which takes a multiple-instance learning with distributions (MILD) data set and fits a modified SVM to it. The MILD-SVM methodology is based on research in progress.

Usage

## Default S3 method:
mismm(
  x,
  y,
  bags,
  instances,
  cost = 1,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "radial", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    nystrom_args = list(m = nrow(x), r = nrow(x), sampling = "random"), max_step = 500,
    scale = TRUE, verbose = FALSE, time_limit = 60, start = FALSE),
  ...
)

## S3 method for class 'formula'
mismm(formula, data, ...)

## S3 method for class 'mild_df'
mismm(x, ...)

Arguments

`x`	A data.frame, matrix, or similar object of covariates, where each row represents a sample. If a `mild_df` object is passed, `y, bags, instances` are automatically extracted, and all other columns will be used as predictors.
`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`bags`	A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.
`instances`	A vector specifying which samples belong to each instance. Can be a string, numeric, of factor.
`cost`	The cost parameter in SVM. If `method = 'heuristic'`, this will be fed to `kernlab::ksvm()`, otherwise it is similarly in internal functions.
`method`	The algorithm to use in fitting (default `'heuristic'`). When `method = 'heuristic'`, the algorithm iterates between selecting positive witnesses and solving an underlying `smm()` problem. When `method = 'mip'`, the novel MIP method will be used. When `method = 'qp-heuristic'`, the heuristic algorithm is computed using a slightly modified dual SMM. See details
`weights`	named vector, or `TRUE`, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If `TRUE`, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of `y`.
`control`	list of additional parameters passed to the method that control computation with the following components: `kernel` either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level. `sigma` argument needed for radial basis kernel. `nystrom_args` a list of parameters to pass to `kfm_nystrom()`. This is used when `method = 'mip'` and `kernel = 'radial'` to generate a Nystrom approximation of the kernel features. `max_step` argument used when `method = 'heuristic'`. Maximum steps of iteration for the heuristic algorithm. `scale` argument used for all methods. A logical for whether to rescale the input before fitting. `verbose` argument used when `method = 'mip'`. Whether to message output to the console. `time_limit` argument used when `method = 'mip'`. `FALSE`, or a time limit (in seconds) passed to `gurobi()` parameters. If `FALSE`, no time limit is given. `start` argument used when `method = 'mip'`. If `TRUE`, the mip program will be warm_started with the solution from `method = 'qp-heuristic'` to potentially improve speed.
`...`	Arguments passed to or from other methods.
`formula`	A formula with specification `mild(y, bags, instances) ~ x` which uses the `mild` function to create the bag-instance structure. This argument is an alternative to the `x, y, bags, instances` arguments, but requires the `data` argument. See examples.
`data`	If `formula` is provided, a data.frame or similar from which formula elements will be extracted.

Details

Several choices of fitting algorithm are available, including a version of the heuristic algorithm proposed by Andrews et al. (2003) and a novel algorithm that explicitly solves the mixed-integer programming (MIP) problem using the gurobi package optimization back-end.

Value

An object of class mismm The object contains at least the following components:

*_fit: A fit object depending on the method parameter. If method = 'heuristic', this will be a ksvm fit from the kernlab package. If method = 'mip' this will be gurobi_fit from a model optimization.
call_type: A character indicating which method misvm() was called with.
x: The training data needed for computing the kernel matrix in prediction.
features: The names of features used in training.
levels: The levels of y that are recorded for future prediction.
cost: The cost parameter from function inputs.
weights: The calculated weights on the cost parameter.
sigma: The radial basis function kernel parameter.
repr_inst: The instances from positive bags that are selected to be most representative of the positive instances.
n_step: If method %in% c('heuristic', 'qp-heuristic'), the total steps used in the heuristic algorithm.
useful_inst_idx: The instances that were selected to represent the bags in the heuristic fitting.
inst_order: A character vector that is used to modify the ordering of input data.
x_scale: If scale = TRUE, the scaling parameters for new predictions.

Methods (by class)

default: Method for data.frame-like objects
formula: Method for passing formula
mild_df: Method for mild_df objects

Author(s)

Sean Kent, Yifei Liu

References

Kent, S., & Yu, M. (2022). Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment arXiv preprint arXiv:2206.14704

Examples

set.seed(8)
mil_data <- generate_mild_df(nbag = 15, nsample = 20, positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))

# Heuristic method
mdl1 <- mismm(mil_data)
mdl2 <- mismm(mild(bag_label, bag_name, instance_name) ~ X1 + X2 + X3, data = mil_data)

# MIP method
if (require(gurobi)) {
  mdl3 <- mismm(mil_data, method = "mip", control = list(nystrom_args = list(m = 10, r = 10)))
  predict(mdl3, mil_data)
}

predict(mdl1, new_data = mil_data, type = "raw", layer = "bag")

# summarize predictions at the bag layer
library(dplyr)
mil_data %>%
  bind_cols(predict(mdl2, mil_data, type = "class")) %>%
  bind_cols(predict(mdl2, mil_data, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)

mildsvm documentation built on July 14, 2022, 9:08 a.m.