# SODA algorithm for variable and interaction selection

### Description

SODA is a forward-backward variable and interaction selection algorithm under logistic regression model with second-order terms. In the forward stage, a stepwise procedure is conducted to screen for important predictors with both main and interaction effects, and in the backward stage SODA remove insignificant terms so as to optimize the extended BIC (EBIC) criterion. SODA is applicable for variable selection for logistic regression, linear/quadratic discriminant analysis and other discriminant analysis with generative model being in exponential family.

### Usage

1 |

### Arguments

`xx` |
The design matrix, of dimensions n * p, without an intercept. Each row is an observation vector. |

`yy` |
The response vector of dimension n * 1. |

`norm` |
Logical flag for xx variable quantile normalization to standard normal, prior to performing SODA algorithm. Default is norm=FALSE. Quantile-normalization is suggested if the data contains obvious outliers. |

`debug` |
Logical flag for printing debug information. |

`gam` |
Tuning paramter gamma in extended BIC criterion. EBIC for selected set S: EBIC = -2 * log-likelihood + |S| * log(n) + 2 * |S| * gamma * log(p) |

`minF` |
Minimum number of steps in forward interaction screening. Default is minF=3. |

### Value

`EBIC` |
Trace of extended Bayesian information criterion (EBIC) score. |

`Type` |
Trace of step type ("Forward (Main)", "Forward (Int)", "Backward"). |

`Var` |
Trace of selected variables. |

`Term` |
Trace of selected main and interaction terms. |

`final_EBIC` |
Final selected term set EBIC score. |

`final_Var` |
Final selected variables. |

`final_Term` |
Final selected main and interaction terms. |

### Author(s)

Yang Li, Jun S. Liu

### References

Li Y, Liu JS. (2015). Robust variable and interaction selection for high-dimensional classification via logistic regression. *Technical Report*.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ```
# simulation study with 1 main effect and 2 interactions (uncomment the code to run)
#N = 250;
#p = 1000;
#r = 0.5;
#s = 1;
#H = abs(outer(1:p, 1:p, "-"))
#S = s * r^H;
#S[cbind(1:p, 1:p)] = S[cbind(1:p, 1:p)] * s
#xx = as.matrix(data.frame(mvrnorm(N, rep(0,p), S)));
#zz = 1 + xx[,1] - xx[,10]^2 + xx[,10]*xx[,20];
#yy = as.numeric(runif(N) < exp(zz) / (1+exp(zz)))
#res_SODA = soda(xx, yy, gam=0.5);
#cv_SODA = soda_trace_CV(xx, yy, res_SODA)
#cv_SODA
# Michigan lung cancer dataset (uncomment the code to run)
#data(mich_lung);
#res_SODA = soda(mich_lung_xx, mich_lung_yy, gam=0.5);
#cv_SODA = soda_trace_CV(mich_lung_xx, mich_lung_yy, res_SODA)
#cv_SODA
``` |