risk_mod: Fit an Integer Risk Score Model
In riskscores: Optimized Integer Risk Score Models

risk_mod

R Documentation

Fit an Integer Risk Score Model

Description

Fits an optimized integer risk score model using a heuristic algorithm. Returns an object of class "risk_mod".

Usage

risk_mod(
  X,
  y,
  gamma = NULL,
  beta = NULL,
  weights = NULL,
  n_train_runs = 1,
  lambda0 = 0,
  a = -10,
  b = 10,
  max_iters = 10000,
  tol = 1e-05,
  shuffle = TRUE,
  seed = NULL,
  method = "annealscore"
)

Arguments

`X`	Input covariate matrix with dimension `n \times p`; every row is an observation.
`y`	Numeric vector for the (binomial) response variable.
`gamma`	Starting value to rescale coefficients for prediction (optional).
`beta`	Starting numeric vector with `p` coefficients. Default starting coefficients are rounded coefficients from a logistic regression model.
`weights`	Numeric vector of length `n` with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.
`n_train_runs`	A positive integer representing the number of times to initialize and train the model, returning the run with the lowest objective function for the training data.
`lambda0`	Penalty coefficient for L0 term (default: 0). See `cv_risk_mod()` for `lambda0` tuning.
`a`	Integer lower bound for coefficients (default: -10).
`b`	Integer upper bound for coefficients (default: 10).
`max_iters`	Maximum number of iterations (default: 10000).
`tol`	Tolerance for convergence (default: 1e-5).
`shuffle`	Whether order of coefficients is shuffled during coordinate descent (default: TRUE).
`seed`	An integer that is used as argument by `set.seed()` for offsetting the random number generator. Default is to not set a particular randomization seed.
`method`	A string that specifies which method ("riskcd" or "annealscore") to run (default: "annealscore")

Details

This function uses either a cyclical coordinate descent algorithm or simulated annealing algorithm to solve the following optimization problem.

\min_{\alpha,\beta} \quad \frac{1}{n} \sum_{i=1}^{n} (\gamma y_i x_i^T \beta - log(1 + exp(\gamma x_i^T \beta))) + \lambda_0 \sum_{j=1}^{p} 1(\beta_{j} \neq 0)

l \le \beta_j \le u \; \; \; \forall j = 1,2,...,p

\beta_j \in \mathbb{Z} \; \; \; \forall j = 1,2,...,p

\beta_0, \gamma \in \mathbb{R}

These constraints ensure that the model will be sparse and include only integer coefficients.

Value

An object of class "risk_mod" with the following attributes:

`gamma`	Final scalar value.
`beta`	Vector of integer coefficients.
`glm_mod`	Logistic regression object of class "glm" (see stats::glm).
`X`	Input covariate matrix.
`y`	Input response vector.
`weights`	Input weights.
`lambda0`	Imput `lambda0` value.
`model_card`	Dataframe displaying the nonzero integer coefficients (i.e. "points") of the risk score model.
`score_map`	Dataframe containing a column of possible scores and a column with each score's associated risk probability.

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod1 <- risk_mod(X, y)
mod1$model_card

mod2 <- risk_mod(X, y, lambda0 = 0.01,)
mod2$model_card

mod3 <- risk_mod(X, y, lambda0 = 0.01, a = -5, b = 5, method = "riskcd")
mod3$model_card

riskscores documentation built on June 8, 2025, 10:27 a.m.