Description Usage Arguments Details Value Examples
This function takes in a training data.frame and optional testing data.frame and performs posterior sampling. It returns posterior mean regression line for training and test sets. The function is built for continuous outcomes. This differs from NDPMix in the following ways: NDPMix returns draws from the posterior *predictive* distribution of the outcome, whereas fDPMix() returns the regression line. This will have the same mean as the NDPMix predictions, but lower variance. Finally, the back-end computation is different, with regression line evaluation at point X being evaluated as a weighted average of cluster-specific regression evaluations at X. This is faster than NDPMix which takes a Monte Carlo appraoch: assign X to one of the cluster specific regressions, then draw a predicted outcome from that cluster's regression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
d_train |
A |
formula |
Specified in the usual way, e.g. for |
d_test |
Optional |
burnin |
integer specifying number of burn-in MCMC draws. |
iter |
integer greater than |
init_k |
Optional. integer specifying the initial number of clusters to kick off the MCMC sampler. |
phi_y |
Optional. Length two |
beta_prior_mean |
Optional. If there are |
beta_prior_var |
Optional. If there are |
beta_var_scale |
Optional. A multiplicative constant that scales |
mu_scale |
Optional. An |
tau_x |
Optional numeric length two vector for specifying the inverse gamma prior on each continuous covariate's prior variance. By default, the inverse gamma prior is centered around the empirical variance. |
We recommend normalizing continuous covariates and outcomes via the scale
function before running fDPMix
Please see https://stablemarkets.github.io/ChiRPsite/index.html for examples and detailed model and parameter descriptions.
Returns predictions$train
and cluster_inds$train
. predictions$train
returns an nrow(d_train)
by iter - burnin
matrix of posterior predictions. cluster_inds$train
returns an nrow(d_train)
by iter - burnin
matrix of cluster assignment indicators, which can be input into the function cluster_assign_mode()
to compute posterior mode assignment. predictions$test
and cluster_inds$test
are returned if d_test
is specified.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | set.seed(1)
N = 200
x<-seq(1,10*pi, length.out = N) # confounder
y<-rnorm(n = length(x), sin(.5*x), .07*x )
d <- data.frame(x=x, y=y)
d$x <- as.numeric(scale(d$x))
d$y <- as.numeric(scale(d$y))
plot(d$x,d$y, pch=20, xlim=c(min(d$x), max(d$x)+1 ), col='gray')
d_test = data.frame(x=seq(max(d$x), max(d$x+1 ), .01 ))
res = fDPMix(d_train = d, d_test = d_test, formula = y ~ x,
iter=100, burnin=50, tau_x = c(.01, .001) )
## in-sample
lines(d$x, rowMeans(res$train), col='steelblue')
lines(d$x, apply(res$train,1,quantile,probs=.05) , col='steelblue', lty=2)
lines(d$x, apply(res$train,1,quantile,probs=.90) , col='steelblue', lty=2)
## out of sample
lines(d_test$x, rowMeans(res$test), col='pink')
lines(d_test$x, apply(res$test,1,quantile,probs=.05) , col='pink', lty=2)
lines(d_test$x, apply(res$test,1,quantile,probs=.90) , col='pink', lty=2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.