process_results: Evaluate a policy
In PLUCR: Policy Learning Under Constraint

process_results

R Documentation

Evaluate a policy

Description

This function evaluates the optimal policy derived from theta and gives the upper bound of the constraint estimator. It updates mu0 and nu0 following the estimation step from the alternating optimization procedure. This enables targeted estimation of the objective functions: risk, constraint, and the main objective, providing a consistent upper bound for the constraint estimator.

Usage

process_results(
  theta,
  X,
  A,
  Y,
  Xi,
  mu0,
  nu0,
  prop_score,
  lambda,
  alpha = 0.1,
  beta = 0.05,
  centered = FALSE
)

Arguments

`theta`	A numeric matrix (k x d). Each row is from FW inner minimization, used to recover an extremal point for convex function construction.
`X`	A matrix of covariates of size n x d (input data in `⁠[0,1]⁠`).
`A`	A binary vector or matrix of length n indicating treatment assignment (0 or 1).
`Y`	A numeric vector or matrix of length n representing primary outcomes (in `⁠[0,1]⁠`).
`Xi`	A numeric vector or matrix of length n indicating adverse events (0 or 1).
`mu0`	A fold-specific function predicting primary outcome (Y) given treatment (A) and covariates (X).
`nu0`	A fold-specific function predicting adverse event outcome (Xi) given treatment (A) and covariates (X).
`prop_score`	A function that estimates the propensity score given treatment (A) and covariates (X).
`lambda`	A non-negative numeric scalar controlling the penalty for violating the constraint.
`alpha`	A numeric scalar representing the constraint tolerance (in `⁠[0,1/2]⁠`, 0.1 by default).
`beta`	A non-negative numeric scalar controlling the sharpness of the probability function.
`centered`	A logical value indicating whether to apply centering in `sigma_beta` (FALSE by default).