risk_mod: Fit an Integer Risk Score Model

View source: R/risk_mod.R

risk_modR Documentation

Fit an Integer Risk Score Model

Description

Fits an optimized integer risk score model using a heuristic algorithm. Returns an object of class "risk_mod".

Usage

risk_mod(
  X,
  y,
  gamma = NULL,
  beta = NULL,
  weights = NULL,
  n_train_runs = 1,
  lambda0 = 0,
  a = -10,
  b = 10,
  max_iters = 10000,
  tol = 1e-05,
  shuffle = TRUE,
  seed = NULL,
  method = "annealscore"
)

Arguments

X

Input covariate matrix with dimension n \times p; every row is an observation.

y

Numeric vector for the (binomial) response variable.

gamma

Starting value to rescale coefficients for prediction (optional).

beta

Starting numeric vector with p coefficients. Default starting coefficients are rounded coefficients from a logistic regression model.

weights

Numeric vector of length n with weights for each observation. Unless otherwise specified, default will give equal weight to each observation.

n_train_runs

A positive integer representing the number of times to initialize and train the model, returning the run with the lowest objective function for the training data.

lambda0

Penalty coefficient for L0 term (default: 0). See cv_risk_mod() for lambda0 tuning.

a

Integer lower bound for coefficients (default: -10).

b

Integer upper bound for coefficients (default: 10).

max_iters

Maximum number of iterations (default: 10000).

tol

Tolerance for convergence (default: 1e-5).

shuffle

Whether order of coefficients is shuffled during coordinate descent (default: TRUE).

seed

An integer that is used as argument by set.seed() for offsetting the random number generator. Default is to not set a particular randomization seed.

method

A string that specifies which method ("riskcd" or "annealscore") to run (default: "annealscore")

Details

This function uses either a cyclical coordinate descent algorithm or simulated annealing algorithm to solve the following optimization problem.

\min_{\alpha,\beta} \quad \frac{1}{n} \sum_{i=1}^{n} (\gamma y_i x_i^T \beta - log(1 + exp(\gamma x_i^T \beta))) + \lambda_0 \sum_{j=1}^{p} 1(\beta_{j} \neq 0)

l \le \beta_j \le u \; \; \; \forall j = 1,2,...,p

\beta_j \in \mathbb{Z} \; \; \; \forall j = 1,2,...,p

\beta_0, \gamma \in \mathbb{R}

These constraints ensure that the model will be sparse and include only integer coefficients.

Value

An object of class "risk_mod" with the following attributes:

gamma

Final scalar value.

beta

Vector of integer coefficients.

glm_mod

Logistic regression object of class "glm" (see stats::glm).

X

Input covariate matrix.

y

Input response vector.

weights

Input weights.

lambda0

Imput lambda0 value.

model_card

Dataframe displaying the nonzero integer coefficients (i.e. "points") of the risk score model.

score_map

Dataframe containing a column of possible scores and a column with each score's associated risk probability.

Examples

y <- breastcancer[[1]]
X <- as.matrix(breastcancer[,2:ncol(breastcancer)])

mod1 <- risk_mod(X, y)
mod1$model_card

mod2 <- risk_mod(X, y, lambda0 = 0.01,)
mod2$model_card

mod3 <- risk_mod(X, y, lambda0 = 0.01, a = -5, b = 5, method = "riskcd")
mod3$model_card

riskscores documentation built on June 8, 2025, 10:27 a.m.