Description Usage Arguments Value References See Also Examples
This is the main function in the CVE
package. It creates objects of
class "cve"
to estimate the mean subspace. Helper functions that
require a "cve"
object can then be applied to the output from this
function.
Conditional Variance Estimation (CVE) is a sufficient dimension reduction (SDR) method for regressions studying E(Y|X), the conditional expectation of a response Y given a set of predictors X. This function provides methods for estimating the dimension and the subspace spanned by the columns of a p x k matrix B of minimal rank k such that
E(Y|X) = E(Y|B'X)
or, equivalently,
Y = g(B'X) + ε
where X is independent of ε with positive definite variance-covariance matrix Var(X) = Σ_X. ε is a mean zero random variable with finite Var(ε) = E(ε^2), g is an unknown, continuous non-constant function, and B = (b_1,..., b_k) is a real p x k matrix of rank k <= p.
Both the dimension k and the subspace span(B) are unknown. The CVE method makes very few assumptions.
A kernel matrix Bhat is estimated such that the column space of Bhat should be close to the mean subspace span(B). The primary output from this method is a set of orthonormal vectors, Bhat, whose span estimates span(B).
The method central implements the Ensemble Conditional Variance Estimation
(ECVE) as described in [2]. It augments the CVE method by applying an
ensemble of functions (parameter func_list
) to the response to
estimate the central subspace. This corresponds to the generalization
F(Y|X) = F(Y|B'X)
or, equivalently,
Y = g(B'X, ε)
where F is the conditional cumulative distribution function.
1 |
formula |
an object of class |
data |
an optional data frame, containing the data for the formula if
supplied like |
method |
This character string specifies the method of fitting. The options are
|
max.dim |
upper bounds for |
... |
optional parameters passed on to |
an S3 object of class cve
with components:
design matrix of predictor vector used for calculating cve-estimate,
n-dimensional vector of responses used for calculating cve-estimate,
Name of used method,
the matched call,
list of components V, L, B, loss, h
for
each k = min.dim, ..., max.dim
. If k
was supplied in the
call min.dim = max.dim = k
.
B
is the cve-estimate with dimension
p x k.
V
is the orthogonal complement of B.
L
is the loss for each sample seperatels such that
it's mean is loss
.
loss
is the value of the target function that is
minimized, evaluated at V.
h
bandwidth parameter used to calculate
B, V, loss, L
.
[1] Fertl, L. and Bura, E. (2021) "Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.08782>
[2] Fertl, L. and Bura, E. (2021) "Ensemble Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.13435>
For a detailed description of formula
see
formula
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | # set dimensions for simulation model
p <- 5
k <- 2
# create B for simulation
b1 <- rep(1 / sqrt(p), p)
b2 <- (-1)^seq(1, p) / sqrt(p)
B <- cbind(b1, b2)
# sample size
n <- 100
set.seed(21)
# creat predictor data x ~ N(0, I_p)
x <- matrix(rnorm(n * p), n, p)
# simulate response variable
# y = f(B'x) + err
# with f(x1, x2) = x1^2 + 2 * x2 and err ~ N(0, 0.25^2)
y <- (x %*% b1)^2 + 2 * (x %*% b2) + 0.25 * rnorm(n)
# calculate cve with method 'mean' for k unknown in 1, ..., 3
cve.obj.s <- cve(y ~ x, max.dim = 2) # default method 'mean'
# calculate cve with method 'weighed' for k = 2
cve.obj.w <- cve(y ~ x, k = 2, method = 'weighted.mean')
B2 <- coef(cve.obj.s, k = 2)
# get projected X data (same as cve.obj.s$X %*% B2)
proj.X <- directions(cve.obj.s, k = 2)
# plot y against projected data
plot(proj.X[, 1], y)
plot(proj.X[, 2], y)
# creat 10 new x points and y according to model
x.new <- matrix(rnorm(10 * p), 10, p)
y.new <- (x.new %*% b1)^2 + 2 * (x.new %*% b2) + 0.25 * rnorm(10)
# predict y.new
yhat <- predict(cve.obj.s, x.new, 2)
plot(y.new, yhat)
# projection matrix on span(B)
# same as B %*% t(B) since B is semi-orthogonal
PB <- B %*% solve(t(B) %*% B) %*% t(B)
# cve estimates for B with mean and weighted method
B.s <- coef(cve.obj.s, k = 2)
B.w <- coef(cve.obj.w, k = 2)
# same as B.s %*% t(B.s) since B.s is semi-orthogonal (same vor B.w)
PB.s <- B.s %*% solve(t(B.s) %*% B.s) %*% t(B.s)
PB.w <- B.w %*% solve(t(B.w) %*% B.w) %*% t(B.w)
# compare estimation accuracy of mean and weighted cve estimate by
# Frobenius norm of difference of projections.
norm(PB - PB.s, type = 'F')
norm(PB - PB.w, type = 'F')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.