WR: Wasserstein regression

WRR Documentation

Wasserstein regression

Description

Wasserstein regression with (1) distributional and/or scalar predictors and distributional responses, (2) distributional (and scalar) predictors and scalar responses.

Usage

WR(X, Y, qSup, optns = list(), FPCAoptnsX = list(), FPCAoptnsY = list())

Arguments

X

A list whose elements consist of distributional predictors and/or scalar predictors. Each distributional predictor has to be represented by a q-by-n matrix whose columns consist of quantile functions of n realizations of this distributional predictor, which are evaluated on a common grid qSup on [0,1] consisting of q points. Each scalar predictor (if any) has to be represented by a vector of length n.

Y

Distributional or scalar responses. A distributional response has to be represented by a q-by-n matrix whose columns consist of quantile functions of n realizations, which are evaluated on a common grid qSup on [0,1] consisting of q points. A scalar response has to be represented by a vector of length n.

qSup

A vector of length q holding the common support grid of quantile functions, of which the elements are in a strictly increasing order and lie between 0 and 1.

optns

A list of optional control parameters for regression specified by list(name=value). See 'Details'.

FPCAoptnsX, FPCAoptnsY

Lists of optional control parameters for functional principal component analysis (FPCA) of log mapped distributional predictors and responses, respectively. See 'Details' of FPCA. For example, one can specify the methods of choosing numbers of functional principal components (FPCs) through methodSelectK and set up Fraction-of-Variance-Explained (FVE) thresholds through FVEthreshold when using FVE to choose numbers of FPCs. Default of methodSelectK: 'FVE'; Default of FVEthreshold: 0.95 (different from FPCA). Default of useBinnedData: 'OFF' (different from FPCA).

Details

Available control options are

methodProj

The projection method if the fitted values of log mapped distributional responses lie out of the log image space: 'log' (the method as per Chen, Lin, and Müller, 2021); 'qt' (default, the method as per Petersen & Müller, 2019).

lower

A scalar specifying the lower bound of the support of responses if they are distributional, only relevant if methodProj is 'qt'. Default: NULL, i.e., no finite lower bound.

upper

A scalar specifying the upper bound of of the support of responses if they are distributional, only relevant if methodProj is 'qt'. Default: NULL, i.e., no finite upper bound.

Value

A list of the following:

Yfit

Fitted responses; either a vector of length n for scalar responses or a q-by-n matrix holding the quantile functions for distributional responses, evaluated on qSup.

qSup

The input qSup.

beta

A list of fitted coefficient functions, j-th element corresponding to j-th predictor in X. Elements take the following form: (1) a scalar, if the response and the corresponding predictor are both scalar; (2) a vector holding the fitted coefficient function evaluated on workGridX, if the response is scalar and the corresponding predictor is distributional; (3) a vector holding the fitted coefficient function evaluated on workGridY, if the response is distributional and the corresponding predictor is scalar; (4) a matrix holding the fitted coefficient function, where (j,k)-th entry holds the value evaluated at (workGridX[j],workGridY[k]), if the response and the corresponding predictor are both distributional.

workGridX

A list of elements, which are either vectors holding a working grid for elements of beta for a distributional predictor, or NULL for a scalar predictor.

workGridY

A vector holding a working grid for elements of beta for a distributional response; NULL for a scalar response.

fpcaLogX

A list of FPCA objects holding the FPCA output for each log mapped distributional predictors; Or NULL if all the predictors are scalar.

fpcaLogY

An FPCA object holding the FPCA output for log mapped responses if the responses are distributional; NULL if the responses are scalar.

X

The input X.

Y

The input Y.

outOfLogSpace

A logical vector holding whether initial fitted log mapped distributional responses lie out of the log image space (TRUE) or not (FALSE); NULL if the responses are scalar.

optns

A list of options actually used.

FPCAoptnsX

A list of FPCA options actually used for distributional predictors; NULL if all the predictors are scalar.

FPCAoptnsY

A list of FPCA options actually used for distributional responses; NULL if the responses are scalar.

References

Chen, Y., Lin, Z., & Müller, H.-G. (2021). "Wasserstein regression." Journal of the American Statistical Association, in press. Petersen, A., & Müller, H.-G. (2019). "Fréchet regression for random objects with Euclidean predictors." The Annals of Statistics, 47(2), 691–719.

Examples


# X = N( mu1, sigma1^2 ), Y = N( mu2, sigma2^2 )
# E( mu2 | mu1, sigma1 ) = E(mu2) + b11 ( mu1 - E(mu1) ) + b12 ( sigma1 - E(sigma1) )
# E( sigma2 | mu1, sigma1 ) = E(sigma2) + b21 ( mu1 - E(mu1) ) + b22 ( sigma1 - E(sigma1) )
# E(sigma2) + b21 ( mu1 - E(mu1) ) + b22 ( sigma1 - E(sigma1) ) >= 0 a.s.
# Then beta(s,t) = b11 + b12 ( s - E(mu1) ) / E(sigma1)  + b21 ( t - E(mu2) ) / E(sigma2) + 
# b22 ( ( s - E(mu1) ) / E(sigma1) ) ( ( t - E(mu2) ) / E(sigma2) )
bMat <- matrix( c( 0.5, 0.5, 0.5, -0.5 ), ncol = 2, byrow = TRUE )
n <- 100
# mu1 ~ Beta(2,2)
mu1 <- rbeta( n, 2, 2 ); Emu1 <- 0.5
# sigma1 ~ Uniform(0.5,1.5)
sigma1 <- runif( n, 0.5, 1.5 ); Esigma1 <- 1
# mu2 = E(mu2) + b11 ( mu1 - E(mu1) ) + b12 ( sigma1 - E(sigma1) ) + err_{mu2}
Emu2 <- 2
err_mu2 <- rnorm( n, mean = 0, sd = 0.1 ) # unexplained stochastic part
mu2True <- Emu2 + bMat[1,1] * ( mu1 - Emu1 ) + bMat[1,2] * ( sigma1 - Esigma1 )
mu2 <- mu2True + err_mu2
# sigma2 = E(sigma2) + b21 ( mu1 - E(mu1) ) + b22 ( sigma1 - E(sigma1) ) + err_{sigma2}
Esigma2 <- 2
err_sigma2 <- ( rbeta( n, 2, 2 ) - 0.5 ) / 5 # unexplained stochastic part
sigma2True <- Esigma2 + bMat[2,1] * ( mu1 - Emu1 ) + bMat[2,2] * ( sigma1 - Esigma1 )
sigma2 <- sigma2True + err_sigma2

nqSup <- 1000
qSup <- seq( 0, 1, length.out = nqSup+1 )
qSup <- ( qSup[-1] + qSup[-(nqSup+1)] ) / 2
X <- list(
  X1 = sapply( seq_len(n), function (i) { 
    qnorm( qSup, mean = mu1[i], sd = sigma1[i] ) 
  })
)
QXmean <- list(
  qnorm( qSup, mean = Emu1, sd = Esigma1 )
)
QYmean <- qnorm( qSup, mean = Emu2, sd = Esigma2 )
Y <- sapply( seq_len(n), function (i) { 
  qnorm( qSup, mean = mu2[i], sd = sigma2[i] ) 
})
res <- WR( X = X, Y = Y, qSup = qSup )



yqgchen/WR documentation built on June 10, 2025, 6:04 p.m.