README.md
In mildsvm: Multiple-Instance Learning with Support Vector Machines

mildsvm

Weakly supervised (WS), multiple instance (MI) data lives in numerous interesting applications such as drug discovery, object detection, and tumor prediction on whole slide images. The mildsvm package provides an easy way to learn from this data by training Support Vector Machine (SVM)-based classifiers. It also contains helpful functions for building and printing multiple instance data frames.

The mildsvm package implements methods that cover a variety of data types, including:

ordinal and binary labels
weakly supervised and traditional supervised structures
vector-based and distributional-instance rows of data

A full table of functions with references is available below. We highlight two methods based on recent research:

omisvm() runs a novel OMI-SVM approach for ordinal, multiple instance (weakly supervised) data using the work of Kent and Yu (2022+)
mismm() run the MISMM approach for binary, weakly supervised data where the instances can be thought of as a matrix of draws from a distribution. This non-convex SVM approach is formalized and applied to breast cancer diagnosis based on morphological features of the tumor microenvironment in Kent and Yu (2022).

A typical MI data frame (a mi_df) with ordinal labels might look like this, with multiple rows of information for each of the bag_names involved and a label that matches each bag:

library(mildsvm)
data("ordmvnorm")

print(ordmvnorm)
#> # An MI data frame: 1,000 × 7 with 200 bags
#> # and instance labels: 1, 1, 2, 1, 1, ...
#>    bag_label bag_name    V1     V2      V3       V4     V5
#>  *     <int>    <int> <dbl>  <dbl>   <dbl>    <dbl>  <dbl>
#>  1         2        1 1.55  -0.977  1.33   -0.659   -0.694
#>  2         2        1 0.980 -2.10  -0.618   2.15    -0.718
#>  3         2        1 6.16  -0.275  2.07   -0.624    0.444
#>  4         2        1 2.90  -2.15  -0.0407 -0.0629   1.38 
#>  5         2        1 2.62  -1.70   1.35   -1.66     1.23 
#>  6         4        2 3.39  -0.927  1.95    0.216   -0.164
#>  7         4        2 3.05  -0.930  1.34   -0.457    0.362
#>  8         4        2 6.63  -4.57   4.66   -0.00729  1.03 
#>  9         4        2 4.38  -0.714  2.32    0.0996   0.379
#> 10         4        2 2.43  -4.28   1.08    0.283   -1.14 
#> # … with 990 more rows
# dplyr::distinct(ordmvnorm, bag_label, bag_name)

The mildsvm package uses the familiar formula and predict methods that R uses will be familiar with. To indicate that MI data is involved, we specify the unique bag label and bag name with mi(bag_label, bag_name) ~ predictors:

fit <- omisvm(mi(bag_label, bag_name) ~ V1 + V2 + V3,
              data = ordmvnorm, 
              weights = NULL)
print(fit)
#> An misvm object called with omisvm.formula 
#>  
#> Parameters: 
#>   method: qp-heuristic 
#>   kernel: linear  
#>   cost: 1 
#>   h: 1 
#>   s: 4 
#>   scale: TRUE 
#>   weights: FALSE 
#>  
#> Model info: 
#>   Levels of `y`: chr [1:5] "1" "2" "3" "4" "5"
#>   Features: chr [1:3] "V1" "V2" "V3"
#>   Number of iterations: 4
predict(fit, new_data = ordmvnorm)
#> # A tibble: 1,000 × 1
#>    .pred_class
#>    <fct>      
#>  1 2          
#>  2 2          
#>  3 2          
#>  4 2          
#>  5 2          
#>  6 4          
#>  7 4          
#>  8 4          
#>  9 4          
#> 10 4          
#> # … with 990 more rows

Or, if the data frame has the mi_df class, we can directly pass it to the function and all features will be included:

fit2 <- omisvm(ordmvnorm)
#> Warning: Weights are not currently implemented for `omisvm()` when `kernel ==
#> 'linear'`.
print(fit2)
#> An misvm object called with omisvm.mi_df 
#>  
#> Parameters: 
#>   method: qp-heuristic 
#>   kernel: linear  
#>   cost: 1 
#>   h: 1 
#>   s: 4 
#>   scale: TRUE 
#>   weights: FALSE 
#>  
#> Model info: 
#>   Levels of `y`: chr [1:5] "1" "2" "3" "4" "5"
#>   Features: chr [1:5] "V1" "V2" "V3" "V4" "V5"
#>   Number of iterations: 3

mildsvm is not currently on CRAN.

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("skent259/mildsvm")

mildsvm also works well MI data with distributional instances. There is a 3-level structure with bags, instances, and samples. As in MIL, instances are contained within bags (where we only observe the bag label). However, for MILD, each instance represents a distribution, and the samples are drawn from this distribution.

You can generate MILD data with generate_mild_df():

# Normal(mean=0, sd=1) vs Normal(mean=3, sd=1)
set.seed(4)
mild_df <- generate_mild_df(
  ncov = 1, nimp_pos = 1, nimp_neg = 1, 
  positive_dist = "mvnormal", positive_mean = 3,
  negative_dist = "mvnormal", negative_mean = 0, 
  nbag = 4,
  ninst = 2, 
  nsample = 2
)
print(mild_df)
#> # An MILD data frame: 16 × 4 with 4 bags, 8 instances
#> # and instance labels: 0, 0, 0, 0, 0, ...
#>    bag_label bag_name instance_name      X1
#>        <dbl> <chr>    <chr>           <dbl>
#>  1         0 bag1     bag1inst1      1.51  
#>  2         0 bag1     bag1inst1     -0.463 
#>  3         0 bag1     bag1inst2      1.79  
#>  4         0 bag1     bag1inst2      1.67  
#>  5         0 bag2     bag2inst1      0.299 
#>  6         0 bag2     bag2inst1      0.666 
#>  7         0 bag2     bag2inst2      0.0118
#>  8         0 bag2     bag2inst2      0.146 
#>  9         1 bag3     bag3inst1      0.546 
#> 10         1 bag3     bag3inst1      0.473 
#> 11         1 bag3     bag3inst2      1.94  
#> 12         1 bag3     bag3inst2      1.25  
#> 13         1 bag4     bag4inst1      1.11  
#> 14         1 bag4     bag4inst1      0.768 
#> 15         1 bag4     bag4inst2      0.111 
#> 16         1 bag4     bag4inst2     -0.290

You can train a MISVM classifier using mismm() on the MILD data with the mild() formula specification:

fit3 <- mismm(mild(bag_label, bag_name, instance_name) ~ X1, data = mild_df, cost = 100)

# summarize predictions at the bag layer
mild_df %>% 
  dplyr::bind_cols(predict(fit3, mild_df, type = "raw")) %>% 
  dplyr::bind_cols(predict(fit3, mild_df, type = "class")) %>% 
  dplyr::distinct(bag_label, bag_name, .pred, .pred_class)
#> # A tibble: 4 × 4
#>   bag_label bag_name  .pred .pred_class
#>       <dbl> <chr>     <dbl> <fct>      
#> 1         0 bag1     -1.18  0          
#> 2         0 bag2      0.482 1          
#> 3         1 bag3      1.00  1          
#> 4         1 bag4      1.00  1

If you summarize a MILD data set (for example, by taking the mean of each covariate), you can recover a MIL data set. Use summarize_samples() for this:

mil_df <- summarize_samples(mild_df, .fns = list(mean = mean)) 
print(mil_df)
#> # A tibble: 8 × 4
#>   bag_label bag_name instance_name    mean
#>       <dbl> <chr>    <chr>           <dbl>
#> 1         0 bag1     bag1inst1      0.522 
#> 2         0 bag1     bag1inst2      1.73  
#> 3         0 bag2     bag2inst1      0.483 
#> 4         0 bag2     bag2inst2      0.0791
#> 5         1 bag3     bag3inst1      0.510 
#> 6         1 bag3     bag3inst2      1.59  
#> 7         1 bag4     bag4inst1      0.941 
#> 8         1 bag4     bag4inst2     -0.0896

You can train an MI-SVM classifier using misvm() on MIL data with the helper function mi():

fit4 <- misvm(mi(bag_label, bag_name) ~ mean, data = mil_df, cost = 100)

print(fit4)
#> An misvm object called with misvm.formula 
#>  
#> Parameters: 
#>   method: heuristic 
#>   kernel: linear  
#>   cost: 100 
#>   scale: TRUE 
#>   weights: ('0' = 0.5, '1' = 1) 
#>  
#> Model info: 
#>   Features: chr "mean"
#>   Number of iterations: 2

| Function | Method | Outcome/label | Data type | Extra libraries | Reference | |-----------------|------------------|---------------|-----------------------|-----------------|--------------| | omisvm() | "qp-heuristic" | ordinal | MI | gurobi | [1] | | mismm() | "heuristic" | binary | distributional MI | — | [2] | | mismm() | "mip" | binary | distributional MI | gurobi | [2] | | mismm() | "qp-heuristic" | binary | distributional MI | gurobi | [2] | | misvm() | "heuristic" | binary | MI | — | [3] | | misvm() | "mip" | binary | MI | gurobi | [3], [2] | | misvm() | "qp-heuristic" | binary | MI | gurobi | [3] | | mior() | "qp-heuristic" | ordinal | MI | gurobi | [4] | | misvm_orova() | "heuristic" | ordinal | MI | — | [3], [1] | | misvm_orova() | "mip" | ordinal | MI | gurobi | [3], [1] | | misvm_orova() | "qp-heuristic" | ordinal | MI | gurobi | [3], [1] | | svor_exc() | "smo" | ordinal | vector | — | [5] | | smm() | — | binary | distributional vector | — | [6] |

Table acronyms

MI: multiple instance
SVM: support vector machine
SMM: support measure machine
OR: ordinal regression
OVA: one-vs-all
MIP: mixed integer programming
QP: quadratic programming
SVOR: support vector ordinal regression
EXC: explicit constraints
SMO: sequential minimal optimization

[1] Kent, S., & Yu, M. (2022+). Ordinal multiple instance support vector machines. In prep.

[2] Kent, S., & Yu, M. (2022). Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment. arXiv preprint arXiv:2206.14704.

[3] Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. Advances in neural information processing systems, 15.

[4] Xiao, Y., Liu, B., & Hao, Z. (2017). Multiple-instance ordinal regression. IEEE Transactions on Neural Networks and Learning Systems, 29(9), 4398-4413.

[5] Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. Neural computation, 19(3), 792-815.

[6] Muandet, K., Fukumizu, K., Dinuzzo, F., & Schölkopf, B. (2012). Learning from distributions via support measure machines. Advances in neural information processing systems, 25.