Description Usage Arguments Details Value Examples
This function takes in a training data.frame and optional testing data.frame and performs posterior sampling. It returns posterior predictions and posterior clustering for training and test sets. The function is built for binary outcomes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
d_train |
A |
formula |
Specified in the usual way, e.g. for |
d_test |
Optional |
burnin |
integer specifying number of burn-in MCMC draws. |
iter |
integer greater than |
beta_prior_mean |
Optional. If there are |
beta_prior_var |
Optional. If there are |
init_k |
Optional. integer specifying the initial number of clusters to kick off the MCMC sampler. |
beta_var_scale |
Optional. A multiplicative constant that scales |
mu_scale |
Optional. An numeric, scalar constant that controls how widely distributed new cluster continuous covariate means are distributed around the empirical covariate mean. Specifically, all continuous covariates are assumed to have Gaussian likelihood with Gaussian prior on their means. |
tau_scale |
Optional. An numeric, scalar constant that controls how widely distributed new cluster continuous covariate variances are distributed around the empirical variance. Specifically, all continuous covariates are assumed to have Gaussian likelihood with Inverse Gamma prior on their variance. |
prop_sigma_b |
Optional. If you specified |
Please see https://stablemarkets.github.io/ChiRPsite/index.html for examples and detailed model and parameter descriptions.
Returns predictions$train
and cluster_inds$train
. predictions$train
returns an nrow(d_train)
by iter - burnin
matrix of posterior predictions. cluster_inds$train
returns an nrow(d_train)
by iter - burnin
matrix of cluster assignment indicators, which can be input into the function cluster_assign_mode()
to compute posterior mode assignment. predictions$test
and cluster_inds$test
are returned if d_test
is specified.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | set.seed(1)
n<-1000 # simulate 1000 subjects
# Cluster 1: 5 continuous covariates with mean 5, variance 5
x1 <- replicate(2, rnorm(n/2, 10, 5) )
# Cluster 2: 5 continuous covariates with mean 5, variance 5
x2 <- replicate(2, rnorm(n/2, -10, 5))
# outcome for both clusters
y <- numeric(length = n)
y[1:500] <- rbinom(500, 1, pnorm(-5 + x1 %*% matrix(c(-2,2), ncol=1) ) )
y[501:1000] <- rbinom(500, 1, pnorm(5 + x2 %*% matrix(c(2,-2), ncol=1) ) )
# combine into data.frame
d <- data.frame(rbind(x1,x2), y)
# normalize covariates...helps for numerical stability.
d$X1 <- scale(d$X1)
d$X2 <- scale(d$X2)
set.seed(100)
# split data into training (800 obs) and testing (200 obs)
test_ids <- sample(1:1000, 200, replace = FALSE )
d_test <- d[test_ids, ]
d_train <- d[-test_ids, ]
logit_res <- PDPMix(d_train = d_train, d_test = d_test,
formula = y ~ X1 + X2,
burnin = 100, iter = 200,
beta_prior_var = rep(3, 3), # fairly flat priors
beta_prior_mean = rep(0,3), # null centered priors
prop_sigma_b = diag(rep(.001, 3)) , # proposal covariance
init_k = 5, tau_scale = 3, mu_scale = 3)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.