WpProj: p-Wasserstein Linear Projections

View source: R/WpProj.R

WpProjR Documentation

p-Wasserstein Linear Projections

Description

[Experimental] This function will calculate linear projections from a set of predictions into the space of the covariates in terms of the p-Wasserstein distance.

Usage

WpProj(
  X,
  eta = NULL,
  theta = NULL,
  power = 2,
  method = c("L1", "binary program", "stepwise", "simulated annealing", "L0"),
  solver = c("lasso", "ecos", "lpsolve", "mosek"),
  options = NULL
)

Arguments

X

An n \times p matrix of covariates

eta

An n \times s matrix of predictions from a model

theta

An optional An p \times s parameter matrix for selection methods. Only makes sense if the original model is a linear model.

power

The power of the Wasserstein distance to use. Must be ⁠>= 1.0⁠. Will default to 2.0.

method

The algorithm to calculate the Wasserstein projections. One of "L1", "binary program", "IP", "stepwise","simulated annealing", or "L0". Will default to "L1" if not provided. See details for more information.

solver

Which solver to use? One of "lasso", "ecos", "lpsolve", or "mosek". See details for more information

options

Options passed to the particular method and desired solver. See details for more information.

Details

Methods

The WpProj function is a wrapper for the various Wasserstein projection methods. It is designed to be a one-stop shop for all Wasserstein projection methods. It will automatically choose the correct method and solver based on the arguments provided. It will also return a standardized output for all methods. Each method has its own set of options that can be passed to it. See the documentation for each method for more information.

For the L1 methods, see L1_method_options() for more information. For the binary program methods, see binary_program_method_options() for more information. For the stepwise methods, see stepwise_method_options() for more information. For the simulated annealing methods, see simulated_annealing_method_options() for more information.

In most cases, we recommend using the L1 methods or binary program methods. The L1 methods are the fastest and applicable to Wasserstein powers of any value greater than 1 and function as direct linear projections into the space of the covariates. The binary program methods instead preserve the coefficients of the original model if this is of interest, such as when the original model was already a linear model. The binary program will instead function as a way of turning on and off certain coefficients in a way that minimizes the Wasserstein distance between reduced and original models. Of note, we also have available an approximate binary program method using a lasso solver. This method is faster than the exact binary program method but is not guaranteed to find the optimal solution. It is recommended to use the exact binary program method if possible. See binary_program_method_options() for more information on how to set up the approximate method as some arguments for the lasso solver should be specified. For more information on how this works, please also see the referenced paper.

The stepwise, simulated annealing, and L0 methods also select covariates like the binary program methods but they can be slower. They are presented merely for comparison purposes given they were used in the original paper.

Wasserstein distances and powers

The Wasserstein distance is a measure of distance between two probability distributions. It is defined as:

W_p(\mu,\nu) = \left(\inf_{\pi \in \Pi(\mu,\nu)} \int_{\mathbb{R}^d \times \mathbb{R}^d} \|x-y\|^p d\pi(x,y)\right)^{1/p},

where \Pi(\mu,\nu) is the set of all joint distributions with marginals \mu and \nu. The Wasserstein distance is a generalization of the Euclidean distance, which is the case when p=2. In our function we have argument power that corresponds to the p of the equation above. The default power is 2.0 but any value greater than or equal to 1.0 is allowed. For more information, see the references.

The particular implementation of the Wasserstein distance is as follows. If \mu is the original prediction from the original model, then we seek to find a new prediction \nu that minimizes the Wasserstein distance between the two: \text{argmin}_\nu W_p(\mu,\nu).

Value

object of class WpProj, which is a list with the following slots:

  • call: The call to the function

  • theta: A list of the final parameter matrices for each returned model

  • fitted.values: A list of the fitted values for each returned model

  • power: The power of the Wasserstein distance used

  • method: The method used to calculate the Wasserstein projections

  • solver: The solver used to calculate the Wasserstein projections

  • niter: The number of iterations used to calculate the Wasserstein projections. Not all methods return a number of iterations so this may be NULL

  • nzero: The number of non zero coefficients in the final models

References

Dunipace, Eric and Lorenzo Trippa (2020) https://arxiv.org/abs/2012.09999.

Examples

if(rlang::is_installed("stats")) {
# note we don't generate believable data with real posteriors
# these examples are just to show how to use the function
n <- 32
p <- 10
s <- 21

# covariates and coefficients
x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p )
beta <- (1:10)/10

#outcome
y <- x %*% beta + stats::rnorm(n)

# fake posterior
post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1)
post_mu <- x %*% post_beta #posterior predictive distributions

# fit models
## L1 model
fit.p2     <-  WpProj(X=x, eta=post_mu, power = 2.0,
                   method = "L1", #default
                   solver = "lasso" #default
)

## approximate binary program
fit.p2.bp <-  WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0,
                   method = "binary program",
                   solver = "lasso" #default because approximate algorithm is faster
)

## compare performance by measuring distance from full model
dc <- distCompare(models = list("L1" = fit.p2, "BP" = fit.p2.bp))
plot(dc)

## compare performance by measuring the relative distance between a null model 
## and the predictions of interest as a pseudo R^2
r2.expect <- WPR2(predictions = post_mu, projected_model = dc) # can have negative values
r2.null  <- WPR2(projected_model = dc) # should be between 0 and 1
plot(r2.null)

## we can also examine how predictions change in the models for individual observations
ridgePlot(fit.p2, index = 21, minCoef = 0, maxCoef = 10)
}

WpProj documentation built on May 29, 2024, 7:55 a.m.