| predict.drf | R Documentation |
Obtain predictions from a DRF forest object. For any point x in the predictor space, it returns the estimate of the conditional distribution
P(Y | X=x) represented as a weighted distribution \sum_i w_i y_i of the training observations {y_i}.
Additionally, this function also provides support to directly obtain estimates of certain target quantities \tau(P(Y | X)),
such as e.g. conditional quantiles, variances or correlations.
## S3 method for class 'drf'
predict(
object,
newdata = NULL,
functional = NULL,
transformation = NULL,
custom.functional = NULL,
num.threads = NULL,
estimate.uncertainty = FALSE,
...
)
object |
Trained DRF forest object. |
newdata |
Points at which predictions should be made. If NULL, returns out-of-bag
predictions on the training set (i.e., for every training point |
functional |
Optional. String indicating the statistical functional that we want to compute from the weights. One option between:
|
transformation |
An optional transformation function that is applied to the responses before computing the target functional. It helps to extend the functionality to a much wider range of targets.
The responses are not transformed by default, i.e. the identity function |
custom.functional |
A user-defined function when |
num.threads |
Number of threads used for computing. If set to NULL, the software automatically selects an appropriate amount. |
estimate.uncertainty |
Whether to additionally return B weight vectors calculated on B CI groups for each sample. See Details and return value docu. |
... |
additional parameters. |
To estimate the uncertainty, a set of B=(num.trees)/(ci.group.size)
weights is produced for each sample when estimate.uncertainty=TRUE.
These B weights arise from B subforests (CI groups)
inside the estimation routine and may be seen as bootstrap
approximation to the estimation uncertainty of the DRF estimator. As such, they
can be used to build confidence intervals for functionals. For instance, for
univariate functionals, one may calculate one functional per weight to obtain
B estimates, with which the variance can be calculated. Then the usual
normal approximation can be used to construct confidence intervals for said functional.
Uncertainty weights are not available OOB.
If functional is NULL, returns a list containing
y |
the matrix of training responses |
weights |
the matrix of weights, whose number of rows corresponds
the number of rows of |
If estimate.uncertainty=TRUE, additionally
weights.uncertainty |
a list of length |
If functional is specified, the desired quantity is returned, in the format described above.
library(drf)
n = 10000
p = 20
d = 3
# Generate training data
X = matrix(rnorm(n * p), nrow=n)
Y = matrix(rnorm(n * d), nrow=n)
Y[, 1] = Y[, 1] + X[, 1]
Y[, 2] = Y[, 2] * X[, 2]
Y[, 3] = Y[, 3] * X[, 1] + X[, 2]
# Fit DRF object
drf.forest = drf(X, Y)
# Generate test data
X_test = matrix(rnorm(10 * p), nrow=10)
out = predict(drf.forest, newdata=X_test)
# Compute E[Y_1 | X] for all data in X_test directly from
# the weights representing the estimated distribution
out$weights %*% out$y[,1]
out = predict(drf.forest, newdata=X_test,
functional='mean')
# Compute E[Y_1 | X] for all data in X_test using built-in functionality
out[,1]
out = predict(drf.forest, newdata=X_test,
functional='quantile',
quantiles=c(0.25, 0.75),
transformation=function(y){y[1] * y[2] * y[3]})
# Compute 25% and 75% quantiles of Y_1*Y_2*Y_3, conditionally on X = X_test[1, ]
out[1,,]
out = predict(drf.forest, newdata=X_test,
functional='cov',
transformation=function(y){matrix(1:6, nrow=2) %*% y})
# Compute 2x2 covariance matrix for (1*Y_1 + 3*Y_2 + 5*Y_3, 2*Y_1 + 4*Y_2 + 6*Y_3),
# conditionally on X = X_test[1, ]
out[1,,]
out = predict(drf.forest, newdata=X_test,
functional='custom',
custom.functional=function(y, w){c(sum(y[, 1] * w), sum(y[, 2] * w))})
# Compute E[Y_1, Y_2 | X] for all data in X_test by providing custom functional that
# computes it from the weights
out
## UNCERTAINTY WEIGHTS ######################################################
# Simulate Data that experiences both a mean as well as sd shift
set.seed(10)
n<-1000
# Simulate from X
x1 <- runif(n,-1,1)
x2 <- runif(n,-1,1)
x3 <- x1+ runif(n,-1,1)
X0 <- matrix(runif(7*n,-1,1), nrow=n, ncol=7)
X <- cbind(x1,x2, x3, X0)
colnames(X)<-NULL
# Simulate dependent variable Y
Y <- as.matrix(rnorm(n,mean = 0.8*(x1 > 0), sd = 1 + 1*(x2 > 0)))
# Fit DRF with 50 CI groups, each 20 trees large. This results in 50 uncertainty weights
DRF <- drf(X=X, Y=Y,num.trees=1000, min.node.size = 5, ci.group.size=1000/50)
# Obtain Test point
x<-matrix(c(0.2, 0.4, runif(8,-1,1)), nrow=1, ncol=10)
DRFpred<-predict(DRF, newdata=x, estimate.uncertainty=TRUE)
## Sample from P_{Y| X=x}
Yxs<-Y[sample(1:n, size=n, replace = T, DRFpred$weights[1,])]
# Calculate quantile prediction as weighted quantiles from Y
qx <- quantile(Yxs, probs = c(0.05,0.95))
# Calculate conditional mean prediction
mux <- mean(Yxs)
# True quantiles
q1<-qnorm(0.05, mean=0.8 * (x[1] > 0), sd=(1+(x[2] > 0)))
q2<-qnorm(0.95, mean=0.8 * (x[1] > 0), sd=(1+(x[2] > 0)))
mu<-0.8 * (x[1] > 0)
# Calculate uncertainty
alpha<-0.05
B<-nrow(DRFpred$weights.uncertainty[[1]])
qxb<-matrix(NaN, nrow=B, ncol=2)
muxb<-matrix(NaN, nrow=B, ncol=1)
for (b in 1:B){
Yxsb<-Y[sample(1:n, size=n, replace = T, DRFpred$weights.uncertainty[[1]][b,])]
qxb[b,] <- quantile(Yxsb, probs = c(0.05,0.95))
muxb[b] <- mean(Yxsb)
}
CI.lower.q1 <- qx[1] - qnorm(1-alpha/2)*sqrt(var(qxb[,1]))
CI.upper.q1 <- qx[1] + qnorm(1-alpha/2)*sqrt(var(qxb[,1]))
CI.lower.q2 <- qx[2] - qnorm(1-alpha/2)*sqrt(var(qxb[,2]))
CI.upper.q2 <- qx[2] + qnorm(1-alpha/2)*sqrt(var(qxb[,2]))
CI.lower.mu <- mux - qnorm(1-alpha/2)*sqrt(var(muxb))
CI.upper.mu <- mux + qnorm(1-alpha/2)*sqrt(var(muxb))
hist(Yxs, prob=T)
z<-seq(-6,7,by=0.01)
d<-dnorm(z, mean=0.8 * (x[1] > 0), sd=(1+(x[2] > 0)))
lines(z,d, col="darkred" )
abline(v=q1,col="darkred" )
abline(v=q2, col="darkred" )
abline(v=qx[1], col="darkblue")
abline(v=qx[2], col="darkblue")
abline(v=mu, col="darkred")
abline(v=mux, col="darkblue")
abline(v=CI.lower.q1, col="darkblue", lty=2)
abline(v=CI.upper.q1, col="darkblue", lty=2)
abline(v=CI.lower.q2, col="darkblue", lty=2)
abline(v=CI.upper.q2, col="darkblue", lty=2)
abline(v=CI.lower.mu, col="darkblue", lty=2)
abline(v=CI.upper.mu, col="darkblue", lty=2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.