compare_ssnet: Fit Several Models and Compare
In jmleach-bst/ssnet: Fit Spike-and-Slab Elastic Net GLM With Spatial Priors on Parameters

compare_ssnet

R Documentation

Fit Several Models and Compare

Description

Fit glmnet and/or ssnet models and output measures of model fit for each. Allows multiple scale values for spike-and-slab models.

Usage

compare_ssnet(
  models = c("glmnet", "ss", "ss_iar"),
  alpha = c(0, 0.5, 1),
  model_fit = "all",
  variable_selection = FALSE,
  classify = FALSE,
  classify.rule = 0.5,
  type_error = "kcv",
  nfolds = 10,
  ncv = 1,
  foldid = NULL,
  fold.seed = NULL,
  s0 = seq(0.01, 0.1, 0.01),
  s1 = 1,
  B = NULL,
  x,
  y,
  family = "gaussian",
  offset = NULL,
  epsilon = 1e-04,
  maxit = 50,
  init = NULL,
  group = NULL,
  Warning = FALSE,
  verbose = FALSE,
  opt.algorithm = "LBFGS",
  iar.data = NULL,
  iar.prior = FALSE,
  p.bound = c(0.01, 0.99),
  tau.prior = "none",
  stan_manual = NULL,
  plot.pj = FALSE,
  im.res = NULL,
  nlambda = 100,
  type.measure = c("default", "mse", "deviance", "class", "auc", "mae", "C"),
  lambda.criteria = "lambda.min",
  output_param_est = FALSE,
  output_cv = FALSE
)

Arguments

`models`	A vector that determines which models to fit. Options include `c("glmnet", "ss", "ss_iar")`. The default is to fit all three models.
`alpha`	A scalar value between 0 and 1 determining the compromise between the Ridge and Lasso models. When `alpha = 1` reduces to the Lasso, and when `alpha = 0` reduces to Ridge.
`model_fit`	A vector containing measures of model fit to output. Options include `c("deviance", "mse", "mae")` for all models, and when `family = "binomial"`, also `c("auc", "misclassification")`. When `model_fit = "all"`, then all appropriate measures of model fit are output.
`variable_selection`	Logical. When `TRUE`, outputs the false discovery proportion (FDP), family-wise error (FWE), and power for the model. Requires that parameter vector `B` be specified. Default is `FALSE`, and is only appropriate for simulated data, when the true and false positives can be known.
`classify`	Logical. When `TRUE` and `family = "binomial"` applies a classification rule given by the argument `classify.rule`, and outputs accuracy, sensitivity, specificity, positive predictive value (ppv), and negative predictive value (npv).
`classify.rule`	A value between 0 and 1. For a given predicted value from a logistic regression, if the value is above `classify.rule`, then the predicted class is 1; otherwise the predicted class is 0. The default is 0.5.
`type_error`	Determines whether models are selected based on training error (`"training"`) or k-fold cross validated estimates of prediction error (`"kcv"`). Defaults to `"kcv"`, which is recommended because training error tends to underestimate the generalization error. See, e.g., Ch. 7 in \insertCiteHastie:2009ssnet.
`nfolds`	number of folds - default is 10. Although `nfolds` can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is `nfolds=3`
`ncv`	repeated number of cross-validation.
`foldid`	an optional vector of values between 1 and `nfolds` identifying what fold each observation is in. If supplied, `nfolds` can be missing.
`fold.seed`	A scalar seed value for cross validation; ensures the folds are the same upon re-running the function. Alternatively, use `foldid` to manually specify folds.
`s0, s1`	A vector of user-selected possible values for the spike scale and slab scale parameter, respectively. The default is `s0 = seq(0.01, 0.1, 0.01)` and `s1 = 1`. However, the user should select values informed by the practical context of the analysis.
`B`	When `variable_selection` is `TRUE`, a vector of "true" parameter values must be input in order to calculate the false discovery proportion (FDR), family-wise error (FWER), and power for the data. This vector should NOT contain a value for the intercept.
`x`	Design, or input, matrix, of dimension nobs x nvars; each row is an observation vector. It is recommended that `x` have user-defined column names for ease of identifying variables. If missing, then `colnames` are internally assigned `x1`, `x2`, ... and so forth.
`y`	Scalar response variable. Quantitative for `family = "gaussian"`, or `family = "poisson"` (non-negative counts). For `family = "gaussian"`, `y` is always standardized. For `family = "binomial"`, `y` should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). For `family="cox"`, `y` should be a two-column matrix with columns named `'time'` and `'status'`. The latter is a binary variable, with `'1'` indicating death, and `'0'` indicating right censored. The function `Surv()` in package survival produces such a matrix. When `family = "multinomial"`, `y` follows the documentation for `glmnet`, but it is preferred that `y` is a factor with two or more levels.
`family`	Response type (see above).
`offset`	A vector of length `nobs` that is included in the linear predictor.
`epsilon`	A positive convergence tolerance; the iterations converge when `\|dev - dev_old\|/(\|dev\| + 0.1) < e`.
`maxit`	An integer giving the maximal number of EM iterations.
`init`	A vector of initial values for all coefficients (not for intercept). If not given, it will be internally produced. If `family = "multinomial"` and the same initializations are desired for each response/outcome category then `init` can be a vector. If different initializations are desired, then `init` should be a list, each element of which contains a vector of initializations. The list should be named according the response/outcome category as they appear in `y`.
`group`	A numeric vector, or an integer, or a list indicating the groups of predictors. If `group = NULL`, all the predictors form a single group. If `group = K`, the predictors are evenly divided into groups each with K predictors. If group is a numberic vector, it defines groups as follows: Group 1: `(group[1]+1):group[2]`, Group 2: `(group[2]+1):group[3]`, Group 3: `(group[3]+1):group[4]`, ... If group is a list of variable names, `group[[k]]` includes variables in the k-th group. The mixture double-exponential prior is only used for grouped predictors. For ungrouped predictors, the prior is double-exponential with scale `ss[2]` and mean 0. Note that grouped predictors when `family = "multinomial"` is still experimental, so use with caution.
`Warning`	Logical. If `TRUE`, shows the error messages of not convergence and identifiability.
`verbose`	Logical. If `TRUE`, prints out the number of iterations and computational time.
`opt.algorithm`	One of `c("LBFGS", "BFGS", "Newton")`. This argument determines which argument is used to optimize the term in the EM algorithm that estimates the probabilities of inclusion for each parameter. Optimization is performed by `optimizing`.
`iar.data`	A list of output from `mungeCARdata4stan` that contains the necessary inputs for the IAR prior. When unspecified, this is built internally assuming that neighbors are those variables directly above, below, left, and right of a given variable location. `im.res` must be specified when allowing this argument to be built internally. It is not recommended to use this argument directly, even when specifying a more complicated neighborhood stucture; this can be specified with the `adjmat` argument, and then internally converted to the correct format.
`iar.prior`	Logical. When `TRUE`, imposes intrinsic autoregressive prior on logit of the probabilities of inclusion. When `FALSE`, treats probabilities of inclusion as unstructured.
`p.bound`	A vector defining the lower and upper boundaries for the probabilities of inclusion in the model, respectively. Defaults to `c(0.01, 0.99)`.
`tau.prior`	One of `c("none", "manual", "cauchy")`. This argument determines the precision parameter in the Conditional Autoregressive model for the (logit of) prior inclusion probabilities. When `"none"`, the precision is set to 1; when "manual", the precision is manually entered by the user; when `"cauchy"`, the inverse precision is assumed to follow a Cauchy distribution with mean 0 and scale 2.5. Note that at this stage of development, only the `"none"` option has been extensively tested, so the other options should be used with caution.
`stan_manual`	A `stan_model` that is manually specified. Especially when fitting multiple models in succession, specifying the `stan` model outside this "loop" may avoid errors.
`plot.pj`	When `TRUE`, prints a series of 2D graphs of the prior probabilities of inclusion at each step of the algorithm. This should NOT be used for 3D data.
`im.res`	A 2-element vector where the first argument is the number of "rows" and the second argument is the number of "columns" in each subject's "image". Default is `NULL`.
`nlambda`	The number of `lambda` values - default is 100.
`type.measure`	loss to use for cross-validation. Currently five options, not all available for all models. The default is `type.measure="deviance"`, which uses squared-error for gaussian models (a.k.a `type.measure="mse"` there), deviance for logistic and poisson regression, and partial-likelihood for the Cox model. `type.measure="class"` applies to binomial and multinomial logistic regression only, and gives misclassification error. `type.measure="auc"` is for two-class logistic regression only, and gives area under the ROC curve. `type.measure="mse"` or `type.measure="mae"` (mean absolute error) can be used by all models except the `"cox"`; they measure the deviation from the fitted mean to the response. `type.measure="C"` is Harrel's concordance measure, only available for `cox` models.
`lambda.criteria`	Determines the model selection criteria. When `"lambda.min"` the final model is selected based on the penalty that minimizes the measure given in `type.measure`. When `"lambda.1se"` the final model is selected based on the smallest value of lambda that is within one standard error of the minimal measure given in `type.measure`.
`output_param_est`	Logical. When `TRUE` adds an element to the output list that includes parameter estimates for each model fit. Defaults to `FALSE`.
`output_cv`	Logical. When `TRUE`, generates a list with an element for all output generated by `cv.bh3`. Default is `FALSE`. The master list has an element for every combination of model, alpha, and s0. Each element is itself a list, the first element being the model fitness measures, and the second element the observed outcomes, fitted outcomes, linear predictor, and fold identifier. When `ncv > 1`, the latter three will consist of multiple columns, one for each run of k-fold CV; e.g., the column containing the fitted outcomes for the second CV run is `y.fitted_2`.
`criteria`	Specifies the criteria for model selection. Options are `"deviance"`, `"mse"`, `"mae"` for deviance, mean-square error, and mean absolute error, respectively. When `family = "binomial"`, additional options are `"auc"` and `"misclassification"`, for area under the ROC curve and the percentage of cases where the difference between the observed and predicted values is greater than 1/2.

Value

When output_param_est = FALSE returns a data frame of model fitness summaries. Otherwise, returns a list whose first element is a dataframe whose rows contain parameter estimates for each model fit, and whose second element is a dataframe of model fitness summaries.

Note

Models fit with 'glmnet' never select the penalty/tuning parameter using the training error; however, when type_error = "training", the measure used to compare 'glmnet' with the other models is based on prediction error estimates from training error. That is, model selection within 'glmnet' is still based on k-fold cross validation, even if comparisons with other models is not.

Examples

library(sim2Dpredictr)
## generate data (no intercept)
set.seed(4799623)

## sample size
n <- 30
## image dims
nr <- 4
nc <- 4

## generate data
cn <- paste0("x", seq_len(nr * nc))
tb <- rbinom(nr * nc, 1, 0.05)
tx <- matrix(rnorm(n * nr * nc), nrow = n, ncol = nr * nc,
             dimnames = list(seq_len(n), cn))
ty <- tx %*% tb + rnorm(n)

## build adjacency matrix
adjmat <- proximity_builder(im.res = c(nr, nc), type = "sparse")

## fit multiple models and compare
compare_ssnet(x = tx, y = ty, family = "gaussian",
              alpha = c(0.5, 1), s0 = c(0.01, 0.05),
              type_error = "kcv", nfolds = 3, im.res = c(4, 4),
              model_fit = "all", variable_selection = TRUE,
              B = tb)

jmleach-bst/ssnet documentation built on March 4, 2024, 5:04 p.m.

jmleach-bst/ssnet index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jmleach-bst/ssnet
Fit Spike-and-Slab Elastic Net GLM With Spatial Priors on Parameters

compare_ssnet: Fit Several Models and Compare
In jmleach-bst/ssnet: Fit Spike-and-Slab Elastic Net GLM With Spatial Priors on Parameters

Fit Several Models and Compare

Description

Usage

Arguments

Value

Note

Examples

Related to compare_ssnet in jmleach-bst/ssnet...

R Package Documentation

Browse R Packages

We want your feedback!

jmleach-bst/ssnet Fit Spike-and-Slab Elastic Net GLM With Spatial Priors on Parameters

compare_ssnet: Fit Several Models and Compare In jmleach-bst/ssnet: Fit Spike-and-Slab Elastic Net GLM With Spatial Priors on Parameters

Fit Several Models and Compare

Description

Usage

Arguments

Value

Note

Examples

Related to compare_ssnet in jmleach-bst/ssnet...

R Package Documentation

Browse R Packages

We want your feedback!

jmleach-bst/ssnet
Fit Spike-and-Slab Elastic Net GLM With Spatial Priors on Parameters

compare_ssnet: Fit Several Models and Compare
In jmleach-bst/ssnet: Fit Spike-and-Slab Elastic Net GLM With Spatial Priors on Parameters