ridge_muhat_lfo_pai: Leave-future-out ridge-based estimates for arm expected...
In banditsCI: Bandit-Based Experiments and Policy Evaluation

ridge_muhat_lfo_pai

R Documentation

Leave-future-out ridge-based estimates for arm expected rewards.

Description

Computes leave-future-out ridge-basedn estimates of arm expected rewards based on provided data.

Usage

ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes, alpha = 1)

Arguments

`xs`	Matrix. Covariates of shape `[A, p]`, where `A` is the number of observations and `p` is the number of features. Must not contain NA values.
`ws`	Integer vector. Indicates which arm was chosen for observations at each time `t`. Length `A`. Must not contain NA values.
`yobs`	Numeric vector. Observed outcomes, length `A`. Must not contain NA values.
`K`	Integer. Number of arms. Must be a positive integer.
`batch_sizes`	Integer vector. Sizes of batches in which data is processed. Must be positive integers.
`alpha`	Numeric. Ridge regression regularization parameter. Default is 1.

Value

A 3D array containing the expected reward estimates for each arm and each time t, of shape [A, A, K].

Examples

set.seed(123)
p <- 3
K <- 5
A <- 100
xs <- matrix(runif(A * p), nrow = A, ncol = p)
ws <- sample(1:K, A, replace = TRUE)
yobs <- runif(A)
batch_sizes <- c(25, 25, 25, 25)
muhat <- ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes)
print(muhat)

banditsCI documentation built on April 12, 2025, 1:42 a.m.