# Variable Selection Deviation (VSD)

### Description

The package calculate the variable selection deviation (VSD) to measure the uncertainty of the selection in terms of inclusion of predictors in the model.

### Usage

1 2 3 4 5 6 |

### Arguments

`x` |
Matrix of predictors. |

`y` |
Response variable. |

`n_train` |
Size of training set when the weight function is ARM or ARM with prior. The default value is |

`no_rep` |
Number of replications when the weight function is ARM and ARM with prior. The default value is |

`n_train_bound` |
When computing the weights using |

`n_bound` |
When computing the weights using |

`model_check` |
The index of the model to be assessed by calculating the VSD measures. |

`psi` |
A positive number to control the improvement of the prior weight. The default value is 1. |

`family` |
Choose the family for GLM models. So far only |

`method` |
User chooses one of the |

`candidate_models` |
Only available when |

`weight_type` |
Options for computing weights for VSD measure. User chooses one of the |

`prior` |
Whether use prior in the weight function. The default is |

`reduce_bias` |
If the binomial model is used, occasionally the algorithm might has convergence issue when the problem of so-called complete separation or quasi-complete separation happens. Users can set |

### Details

See Reference section.

### Value

A "glmvsd" object is retured. The components are:

`VSD` |
Variable selection deviation (VSD) value. |

`VSD_minus` |
The lower VSD value of |

`VSD_plus` |
The upper VSD value of |

`weight` |
The weight for each candidate model. |

`DIFF` |
Counting the variable differences between candidate models and |

`candidate_models_cleaned` |
Cleaned candidate models: the duplicated candidate models are cleaned; When computing VSD weights using AIC and BIC, the models with more than n-2 variables are removed (n is the number of observaitons); When computing VSD weights using ARM, the models with more than n_train-2 variables are removed (n_train is the number of training observations). |

### References

Nan, Y. and Yang, Y. (2013), "Variable Selection Diagnostics Measures for High-dimensional Regression," *Journal of Computational and Graphical Statistics*, 23:3, 636-656.

http://dx.doi.org/10.1080/10618600.2013.829780

BugReport: https://github.com/emeryyi/glmvsd

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | ```
# REGRESSION CASE
# generate simulation data
n <- 50
p <- 8
beta <- c(3,1.5,0,0,2,0,0,0)
sigma <- matrix(0,p,p)
for(i in 1:p){
for(j in 1:p) sigma[i,j] <- 0.5^abs(i-j)
}
x <- mvrnorm(n, rep(0,p), sigma)
e <- rnorm(n)
y <- x %*% beta + e
# user provide a model to be checked
model_check <- c(0,1,1,1,0,0,0,1)
# compute VSD for model_check using ARM with prior
v_ARM <- glmvsd(x, y, n_train = ceiling(n/2),
no_rep=50, model_check = model_check, psi=1,
family = "gaussian", method = "union",
weight_type = "ARM", prior = TRUE)
# compute VSD for model_check using AIC
v_AIC <- glmvsd(x, y,
model_check = model_check,
family = "gaussian", method = "union",
weight_type = "AIC", prior = TRUE)
# compute VSD for model_check using BIC
v_BIC <- glmvsd(x, y,
model_check = model_check,
family = "gaussian", method = "union",
weight_type = "BIC", prior = TRUE)
# user supplied candidate models
candidate_models = rbind(c(0,0,0,0,0,0,0,1),
c(0,1,0,0,0,0,0,1), c(0,1,1,1,0,0,0,1),
c(0,1,1,0,0,0,0,1), c(1,1,0,1,1,0,0,0),
c(1,1,0,0,1,0,0,0))
v1_BIC <- glmvsd(x, y,
model_check = model_check, psi=1,
family = "gaussian",
method = "customize",
candidate_models = candidate_models,
weight_type = "BIC", prior = TRUE)
# CLASSIFICATION CASE
# generate simulation data
n = 300
p = 8
b <- c(1,1,1,-3*sqrt(2)/2)
x=matrix(rnorm(n*p, mean=0, sd=1), n, p)
feta=x[, 1:4]%*%b
fprob=exp(feta)/(1+exp(feta))
y=rbinom(n, 1, fprob)
# user provide a model to be checked
model_check <- c(0,1,1,1,0,0,0,1)
# compute VSD for model_check using BIC with prior
b_BIC <- glmvsd(x, y, n_train = ceiling(n/2),
family = "binomial",
no_rep=50, model_check = model_check, psi=1,
method = "union", weight_type = "BIC",
prior = TRUE)
candidate_models =
rbind(c(0,0,0,0,0,0,0,1),
c(0,1,0,0,0,0,0,1),
c(1,1,1,1,0,0,0,0),
c(0,1,1,0,0,0,0,1),
c(1,1,0,1,1,0,0,0),
c(1,1,0,0,1,0,0,0),
c(0,0,0,0,0,0,0,0),
c(1,1,1,1,1,0,0,0))
# compute VSD for model_check using AIC
# user supplied candidate models
b_AIC <- glmvsd(x, y,
family = "binomial",
model_check = model_check, psi=1,
method = "customize",
candidate_models = candidate_models,
weight_type = "AIC")
``` |