validate_ssnet: Train and evaluate models

View source: R/validate_ssnet.R

validate_ssnetR Documentation

Train and evaluate models

Description

Fit a model using ssnet with training data and evaluate using test data.

Usage

validate_ssnet(
  model = "ss",
  alpha = 1,
  classify = FALSE,
  classify.rule = 0.5,
  type.multinomial = "grouped",
  s0 = 0.01,
  s1 = 1,
  x.train,
  y.train,
  x.test,
  y.test,
  family = "gaussian",
  offset = NULL,
  epsilon = 1e-04,
  maxit = 50,
  init = NULL,
  group = NULL,
  Warning = FALSE,
  verbose = FALSE,
  opt.algorithm = "LBFGS",
  iar.data = NULL,
  iar.prior = FALSE,
  adjmat = NULL,
  p.bound = c(0.01, 0.99),
  tau.prior = "none",
  tau.manual = NULL,
  stan_manual = NULL,
  nlambda = 100,
  lambda.criteria = "lambda.min",
  output_param_est = FALSE,
  output_probs = FALSE,
  print_check = FALSE
)

Arguments

model

Specify which model to fit. Options include c("glmnet", "ss", "ss_iar").

alpha

A scalar value between 0 and 1 determining the compromise between the Ridge and Lasso models. When alpha = 1 reduces to the Lasso, and when alpha = 0 reduces to Ridge.

classify

Logical. When TRUE and family = "binomial" applies a classification rule given by the argument classify.rule, and outputs accuracy, sensitivity, specificity, positive predictive value (ppv), and negative predictive value (npv).

classify.rule

A value between 0 and 1. For a given predicted value from a logistic regression, if the value is above classify.rule, then the predicted class is 1; otherwise the predicted class is 0. The default is 0.5.

type.multinomial

If "grouped" then a grouped lasso penalty is used on the multinomial coefficients for a variable. This ensures they are all in our out together. The default is "ungrouped"

s0, s1

A numeric value. When fitting a spike-and-slab model, s0 is the spike scale and s1 is the slab scale. Default is s0 = 0.01 and s1 = 1. When model = "glmnet", only s0 is used.

x.train, x.test

Design matrices for training and test data, respectively.

y.train, y.test

Response/outcome vectors for training and testing, respectively.

family

Response type (see above).

offset

A vector of length nobs that is included in the linear predictor.

epsilon

A positive convergence tolerance; the iterations converge when |dev - dev_old|/(|dev| + 0.1) < e.

maxit

An integer giving the maximal number of EM iterations.

init

A vector of initial values for all coefficients (not for intercept). If not given, it will be internally produced. If family = "multinomial" and the same initializations are desired for each response/outcome category then init can be a vector. If different initializations are desired, then init should be a list, each element of which contains a vector of initializations. The list should be named according the response/outcome category as they appear in y.

group

A numeric vector, or an integer, or a list indicating the groups of predictors. If group = NULL, all the predictors form a single group. If group = K, the predictors are evenly divided into groups each with K predictors. If group is a numberic vector, it defines groups as follows: Group 1: (group[1]+1):group[2], Group 2: (group[2]+1):group[3], Group 3: (group[3]+1):group[4], ... If group is a list of variable names, group[[k]] includes variables in the k-th group. The mixture double-exponential prior is only used for grouped predictors. For ungrouped predictors, the prior is double-exponential with scale ss[2] and mean 0. Note that grouped predictors when family = "multinomial" is still experimental, so use with caution.

Warning

Logical. If TRUE, shows the error messages of not convergence and identifiability.

verbose

Logical. If TRUE, prints out the number of iterations and computational time.

opt.algorithm

One of c("LBFGS", "BFGS", "Newton"). This argument determines which argument is used to optimize the term in the EM algorithm that estimates the probabilities of inclusion for each parameter. Optimization is performed by optimizing.

iar.data

A list of output from mungeCARdata4stan that contains the necessary inputs for the IAR prior. When unspecified, this is built internally assuming that neighbors are those variables directly above, below, left, and right of a given variable location. im.res must be specified when allowing this argument to be built internally. It is not recommended to use this argument directly, even when specifying a more complicated neighborhood stucture; this can be specified with the adjmat argument, and then internally converted to the correct format.

iar.prior

Logical. When TRUE, imposes intrinsic autoregressive prior on logit of the probabilities of inclusion. When FALSE, treats probabilities of inclusion as unstructured.

adjmat

A data.frame or matrix containing a "sparse" representation of the neighbor relationships. The first column should contain a numerical index for a given location. Each index will be repeated in this column for every neighbor it has. The indices for the location's neighbors are then specified in the second column. Any additional columns are ignored.

p.bound

A vector defining the lower and upper boundaries for the probabilities of inclusion in the model, respectively. Defaults to c(0.01, 0.99).

tau.prior

One of c("none", "manual", "cauchy"). This argument determines the precision parameter in the Conditional Autoregressive model for the (logit of) prior inclusion probabilities. When "none", the precision is set to 1; when "manual", the precision is manually entered by the user; when "cauchy", the inverse precision is assumed to follow a Cauchy distribution with mean 0 and scale 2.5. Note that at this stage of development, only the "none" option has been extensively tested, so the other options should be used with caution.

tau.manual

When tau.prior = "manual", use this argument to specify a common precision parameter.

stan_manual

A stan_model that is manually specified. Especially when fitting multiple models in succession, specifying the stan model outside this "loop" may avoid errors.

nlambda

The number of lambda values - default is 100.

lambda.criteria

Determines the model selection criteria. When "lambda.min" the final model is selected based on the penalty that minimizes the measure given in type.measure. When "lambda.1se" the final model is selected based on the smallest value of lambda that is within one standard error of the minimal measure given in type.measure.

output_param_est

Logical. When TRUE adds an element to the output list that includes parameter estimates for the fitted model. Defaults is FALSE.

output_probs

Logical. When TRUE and family = "multinomial" adds an element to the output list that contains the probabilties of being a member of each category for each subject, in addition to their classification. Default is FALSE.

print_check

Logical. When TRUE, prints intermediate results.

Value

A list or a data frame. When output_param_est = FALSE, returns a data frame with a single row containing measures of model fitness. Otherwise, returns a list with 2 elements. The first element, model_fitness, contains a data frame with a single row containing measures of model fitness, and the second element, param_est, contains a data frame of parameter estimates.

Examples

xtr <- matrix(rnorm(100*5), nrow = 100, ncol = 5)
xte <- matrix(rnorm(100*5), nrow = 100, ncol = 5)
b <- rnorm(5)

## continuous
ytr <- xtr %*% b + rnorm(100)
yte <- xte %*% b + rnorm(100)

validate_ssnet(
  model = "glmnet", family = "gaussian",
  x.train = xtr, x.test = xte,
  y.train = ytr, y.test = yte
)

validate_ssnet(
  model = "ss", family = "gaussian",
  x.train = xtr, x.test = xte,
  y.train = ytr, y.test = yte
)

 ## binary
ybtr <- ifelse(ytr > 0, 1, 0)
ybte <- ifelse(yte > 0, 1, 0)

validate_ssnet(
  model = "glmnet", family = "binomial",
  x.train = xtr, x.test = xte,
  y.train = ybtr, y.test = ybte,
  classify = TRUE, s0 = 0.1
)

validate_ssnet(
  model = "ss", family = "binomial",
  x.train = xtr, x.test = xte,
  y.train = ybtr, y.test = ybte,
  classify = TRUE, s0 = 0.05, s1 = 1
)

## multinomial outcome
ymtr <- dplyr::case_when(
  ytr > 1 ~ "a",
  ytr <= 1 & ytr > -1 ~ "b",
  ytr <= -1 ~ "c"
)
ymte <- dplyr::case_when(
  yte > 1 ~ "a",
  yte <= 1 & yte > -1 ~ "b",
  yte <= -1 ~ "c"
)

validate_ssnet(
  model = "glmnet", family = "multinomial",
  x.train = xtr, x.test = xte,
  y.train = ymtr, y.test = ymte,
  classify = TRUE, s0 = 0.1,
  output_param_est = TRUE
)

validate_ssnet(
  model = "ss", family = "multinomial",
  x.train = xtr, x.test = xte,
  y.train = ymtr, y.test = ymte,
  classify = TRUE, s0 = 0.1, s1 = 1,
  output_param_est = TRUE
)



jmleach-bst/ssnet documentation built on March 4, 2024, 5:04 p.m.