Description Usage Arguments Details Value Examples
This function takes in a training data.frame and optional testing data.frame and performs posterior sampling. It returns posterior predictions and posterior clustering for training and test sets. The function is built for zero-inflated, but otherwise continuous, outcomes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ZDPMix(
d_train,
formula,
d_test = NULL,
burnin = 100,
iter = 1000,
phi_y = c(shape = 5, rate = 1000),
beta_prior_mean = NULL,
beta_prior_var = NULL,
gamma_prior_mean = NULL,
gamma_prior_var = NULL,
init_k = 10,
beta_var_scale = 1000,
mu_scale = 1,
tau_scale = 1,
prop_sigma_z = diag(rep(0.025, nparams))
)
|
d_train |
A |
formula |
Specified in the usual way, e.g. for |
d_test |
Optional |
burnin |
integer specifying number of burn-in MCMC draws. |
iter |
integer greater than |
phi_y |
Optional. Length two |
beta_prior_mean |
Optional. If there are |
beta_prior_var |
Optional. If there are |
gamma_prior_mean |
Optional. If there are |
gamma_prior_var |
Optional. If there are |
init_k |
Optional. integer specifying the initial number of clusters to kick off the MCMC sampler. |
beta_var_scale |
Optional. A multiplicative constant that scales |
mu_scale |
Optional. An numeric, scalar constant that controls how widely distributed new cluster continuous covariate means are distributed around the empirical covariate mean. Specifically, all continuous covariates are assumed to have Gaussian likelihood with Gaussian prior on their means. |
tau_scale |
Optional. An numeric, scalar constant that controls how widely distributed new cluster continuous covariate variances are distributed around the empirical variance. Specifically, all continuous covariates are assumed to have Gaussian likelihood with Inverse Gamma prior on their variance. |
prop_sigma_z |
Optional. If you specified |
Please see https://stablemarkets.github.io/ChiRPsite/index.htmlfor examples and detailed model and parameter descriptions.
Please see https://arxiv.org/abs/1810.09494 for a methodological reference.
Returns predictions$train
and cluster_inds$train
. predictions$train
returns an nrow(d_train)
by iter - burnin
matrix of posterior predictions. cluster_inds$train
returns an nrow(d_train)
by iter - burnin
matrix of cluster assignment indicators, which can be input into the function cluster_assign_mode()
to compute posterior mode assignment. predictions$test
and cluster_inds$test
are returned if d_test
is specified.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | set.seed(1)
n<-200 ## generate from clustered, skewed, data distribution
X11 <- rnorm(n = n, mean = 10, sd = 3)
X12 <- rnorm(n = n, mean = 0, sd = 2)
X13 <- rnorm(n = n, mean = -10, sd = 4)
Y1 <- rnorm(n = n, mean = 100 + .5*X11, 20)*(1-rbinom(n, 1, prob = pnorm( -10 + 1*X11 ) ))
Y2 <- rnorm(n = n, mean = 200 + 1*X12, 30)*(1-rbinom(n, 1, prob = pnorm( 1 + .05*X12 ) ))
Y3 <- rnorm(n = n, mean = 300 + 2*X13, 40)*(1-rbinom(n, 1, prob = pnorm( -3 -.2*X13 ) ))
d <- data.frame(X1=c(X11, X12, X13), Y = c(Y1, Y2, Y3))
d$X1 <- scale(d$X1)
ids <- sample(1:600, size = 500, replace = FALSE )
d_train <- d[ids,]
d_test <- d[-ids, ]
res <- ChiRP::ZDPMix(d_train = d_train, d_test = d_test, formula = Y ~ X1,
burnin=100, iter=200, init_k = 5, phi_y = c(10, 10000))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.