WR: Wasserstein regression
In yqgchen/WR: Wasserstein regression for distributional data

WR	R Documentation

Wasserstein regression

Description

Wasserstein regression with (1) distributional and/or scalar predictors and distributional responses, (2) distributional (and scalar) predictors and scalar responses.

Usage

WR(X, Y, qSup, optns = list(), FPCAoptnsX = list(), FPCAoptnsY = list())

Arguments

`X`	A list whose elements consist of distributional predictors and/or scalar predictors. Each distributional predictor has to be represented by a `q`-by-`n` matrix whose columns consist of quantile functions of `n` realizations of this distributional predictor, which are evaluated on a common grid `qSup` on `[0,1]` consisting of `q` points. Each scalar predictor (if any) has to be represented by a vector of length `n`.
`Y`	Distributional or scalar responses. A distributional response has to be represented by a `q`-by-`n` matrix whose columns consist of quantile functions of `n` realizations, which are evaluated on a common grid `qSup` on `[0,1]` consisting of `q` points. A scalar response has to be represented by a vector of length `n`.
`qSup`	A vector of length `q` holding the common support grid of quantile functions, of which the elements are in a strictly increasing order and lie between 0 and 1.
`optns`	A list of optional control parameters for regression specified by `list(name=value)`. See 'Details'.
`FPCAoptnsX`, `FPCAoptnsY`	Lists of optional control parameters for functional principal component analysis (FPCA) of log mapped distributional predictors and responses, respectively. See 'Details' of `FPCA`. For example, one can specify the methods of choosing numbers of functional principal components (FPCs) through `methodSelectK` and set up Fraction-of-Variance-Explained (FVE) thresholds through `FVEthreshold` when using FVE to choose numbers of FPCs. Default of `methodSelectK`: `'FVE'`; Default of `FVEthreshold`: 0.95 (different from `FPCA`). Default of `useBinnedData`: `'OFF'` (different from `FPCA`).

Details

Available control options are

methodProj: The projection method if the fitted values of log mapped distributional responses lie out of the log image space: 'log' (the method as per Chen, Lin, and Müller, 2021); 'qt' (default, the method as per Petersen & Müller, 2019).
lower: A scalar specifying the lower bound of the support of responses if they are distributional, only relevant if methodProj is 'qt'. Default: NULL, i.e., no finite lower bound.
upper: A scalar specifying the upper bound of of the support of responses if they are distributional, only relevant if methodProj is 'qt'. Default: NULL, i.e., no finite upper bound.

Value

A list of the following:

`Yfit`	Fitted responses; either a vector of length `n` for scalar responses or a `q`-by-`n` matrix holding the quantile functions for distributional responses, evaluated on `qSup`.
`qSup`	The input `qSup`.
`beta`	A list of fitted coefficient functions, `j`-th element corresponding to `j`-th predictor in `X`. Elements take the following form: (1) a scalar, if the response and the corresponding predictor are both scalar; (2) a vector holding the fitted coefficient function evaluated on `workGridX`, if the response is scalar and the corresponding predictor is distributional; (3) a vector holding the fitted coefficient function evaluated on `workGridY`, if the response is distributional and the corresponding predictor is scalar; (4) a matrix holding the fitted coefficient function, where (`j`,`k`)-th entry holds the value evaluated at (`workGridX[j]`,`workGridY[k]`), if the response and the corresponding predictor are both distributional.
`workGridX`	A list of elements, which are either vectors holding a working grid for elements of `beta` for a distributional predictor, or `NULL` for a scalar predictor.
`workGridY`	A vector holding a working grid for elements of `beta` for a distributional response; `NULL` for a scalar response.
`fpcaLogX`	A list of FPCA objects holding the FPCA output for each log mapped distributional predictors; Or `NULL` if all the predictors are scalar.
`fpcaLogY`	An FPCA object holding the FPCA output for log mapped responses if the responses are distributional; `NULL` if the responses are scalar.
`X`	The input `X`.
`Y`	The input `Y`.
`outOfLogSpace`	A logical vector holding whether initial fitted log mapped distributional responses lie out of the log image space (`TRUE`) or not (`FALSE`); `NULL` if the responses are scalar.
`optns`	A list of options actually used.
`FPCAoptnsX`	A list of FPCA options actually used for distributional predictors; `NULL` if all the predictors are scalar.
`FPCAoptnsY`	A list of FPCA options actually used for distributional responses; `NULL` if the responses are scalar.

References

Chen, Y., Lin, Z., & Müller, H.-G. (2021). "Wasserstein regression." Journal of the American Statistical Association, in press. Petersen, A., & Müller, H.-G. (2019). "Fréchet regression for random objects with Euclidean predictors." The Annals of Statistics, 47(2), 691–719.

Examples


# X = N( mu1, sigma1^2 ), Y = N( mu2, sigma2^2 )
# E( mu2 | mu1, sigma1 ) = E(mu2) + b11 ( mu1 - E(mu1) ) + b12 ( sigma1 - E(sigma1) )
# E( sigma2 | mu1, sigma1 ) = E(sigma2) + b21 ( mu1 - E(mu1) ) + b22 ( sigma1 - E(sigma1) )
# E(sigma2) + b21 ( mu1 - E(mu1) ) + b22 ( sigma1 - E(sigma1) ) >= 0 a.s.
# Then beta(s,t) = b11 + b12 ( s - E(mu1) ) / E(sigma1)  + b21 ( t - E(mu2) ) / E(sigma2) + 
# b22 ( ( s - E(mu1) ) / E(sigma1) ) ( ( t - E(mu2) ) / E(sigma2) )
bMat <- matrix( c( 0.5, 0.5, 0.5, -0.5 ), ncol = 2, byrow = TRUE )
n <- 100
# mu1 ~ Beta(2,2)
mu1 <- rbeta( n, 2, 2 ); Emu1 <- 0.5
# sigma1 ~ Uniform(0.5,1.5)
sigma1 <- runif( n, 0.5, 1.5 ); Esigma1 <- 1
# mu2 = E(mu2) + b11 ( mu1 - E(mu1) ) + b12 ( sigma1 - E(sigma1) ) + err_{mu2}
Emu2 <- 2
err_mu2 <- rnorm( n, mean = 0, sd = 0.1 ) # unexplained stochastic part
mu2True <- Emu2 + bMat[1,1] * ( mu1 - Emu1 ) + bMat[1,2] * ( sigma1 - Esigma1 )
mu2 <- mu2True + err_mu2
# sigma2 = E(sigma2) + b21 ( mu1 - E(mu1) ) + b22 ( sigma1 - E(sigma1) ) + err_{sigma2}
Esigma2 <- 2
err_sigma2 <- ( rbeta( n, 2, 2 ) - 0.5 ) / 5 # unexplained stochastic part
sigma2True <- Esigma2 + bMat[2,1] * ( mu1 - Emu1 ) + bMat[2,2] * ( sigma1 - Esigma1 )
sigma2 <- sigma2True + err_sigma2

nqSup <- 1000
qSup <- seq( 0, 1, length.out = nqSup+1 )
qSup <- ( qSup[-1] + qSup[-(nqSup+1)] ) / 2
X <- list(
  X1 = sapply( seq_len(n), function (i) { 
    qnorm( qSup, mean = mu1[i], sd = sigma1[i] ) 
  })
)
QXmean <- list(
  qnorm( qSup, mean = Emu1, sd = Esigma1 )
)
QYmean <- qnorm( qSup, mean = Emu2, sd = Esigma2 )
Y <- sapply( seq_len(n), function (i) { 
  qnorm( qSup, mean = mu2[i], sd = sigma2[i] ) 
})
res <- WR( X = X, Y = Y, qSup = qSup )

yqgchen/WR documentation built on June 10, 2025, 6:04 p.m.