View source: R/surbart_eqbyeq.R
surbart_eqbyeq | R Documentation |
Seemingly Unrelated Regression Bayesian Additive Regression Trees implemented using MCMC with equation-by-equation tree draws.
surbart_eqbyeq(
x.train,
x.test,
y,
num_outcomes,
num_obs,
num_test_obs,
n.iter = 1000,
n.burnin = 100,
n.trees = 50L,
n.burn = 0L,
n.samples = 1L,
n.thin = 1L,
n.chains = 1,
n.threads = guessNumCores(),
printEvery = 100L,
printCutoffs = 0L,
rngKind = "default",
rngNormalKind = "default",
rngSeed = NA_integer_,
updateState = FALSE,
tree.prior = dbarts:::cgm,
node.prior = dbarts:::normal,
resid.prior = dbarts:::chisq,
proposal.probs = c(birth_death = 0.5, swap = 0.1, change = 0.4, birth = 0.5),
sigmadbarts = NA_real_,
print.opt = 100,
quiet = FALSE,
outcome_draws = FALSE
)
x.train |
The training covariate data for all training observations. Matrix or list. If one matrix (not a list), then the same set of covariates are used for each outcome. Each element of the list is a covariate matrix that corresponds to a different outcome variable. Number of rows equal to the number of observations. Number of columns equal to the number of covariates. |
x.test |
The test covariate data for all test observations. Matrix or list. If one matrix (not a list), then the same set of covariates are used for each outcome. Each element of the list is a covariate matrix that corresponds to a different outcome variable. Number of rows equal to the number of observations. Number of columns equal to the number of covariates. |
y |
The training data list of vectors of outcomes. The length of the list should equal the number of types of outcomes. Each element of the list should be a vector of length equal to the number of units. |
num_outcomes |
The number of outcome variables. |
num_obs |
The number of observations per outcome. |
num_test_obs |
THe number of test observations per outcome. |
n.iter |
Number of iterations excluding burnin. |
n.burnin |
Number of burnin iterations. |
n.trees |
(dbarts control option) A positive integer giving the number of trees used in the sum-of-trees formulation. Assuming each outcome variable is modelled using the same number of trees. Each outcome is modelled using a distinct sum of trees (as opposed to shared trees). |
n.chains |
(dbarts control option) A positive integer detailing the number of independent chains for the dbarts sampler to use (more than one chain is unlikely to improve speed because only one sample for each call to dbarts). |
n.threads |
(dbarts control option) A positive integer controlling how many threads will be used for various internal calculations, as well as the number of chains. Internal calculations are highly optimized so that single-threaded performance tends to be superior unless the number of observations is very large (>10k), so that it is often not necessary to have the number of threads exceed the number of chains. |
printEvery |
(dbarts control option)If verbose is TRUE, every printEvery potential samples (after thinning) will issue a verbal statement. Must be a positive integer. |
printCutoffs |
(dbarts control option) A non-negative integer specifying how many of the decision rules for a variable are printed in verbose mode |
rngKind |
(dbarts control option) Random number generator kind, as used in set.seed. For type "default", the built-in generator will be used if possible. Otherwise, will attempt to match the built-in generator’s type. Success depends on the number of threads. |
rngNormalKind |
(dbarts control option) Random number generator normal kind, as used in set.seed. For type "default", the built-in generator will be used if possible. Otherwise, will attempt to match the built-in generator’s type. Success depends on the number of threads and the rngKind |
rngSeed |
(dbarts control option) Random number generator seed, as used in set.seed. If the sampler is running single-threaded or has one chain, the behavior will be as any other sequential algorithm. If the sampler is multithreaded, the seed will be used to create an additional pRNG object, which in turn will be used sequentially seed the threadspecific pRNGs. If equal to NA, the clock will be used to seed pRNGs when applicable. |
updateState |
(dbarts control option) Logical setting the default behavior for many sampler methods with regards to the immediate updating of the cached state of the object. A current, cached state is only useful when saving/loading the sampler. |
tree.prior |
(dbarts option) An expression of the form dbarts:::cgm or dbarts:::cgm(power,base) setting the tree prior used in fitting. |
node.prior |
(dbarts option) An expression of the form dbarts:::normal or dbarts:::normal(k) that sets the prior used on the averages within nodes. |
resid.prior |
(dbarts option) An expression of the form dbarts:::chisq or dbarts:::chisq(df,quant) that sets the prior used on the residual/error variance |
proposal.probs |
(dbarts option) Named numeric vector or NULL, optionally specifying the proposal rules and their probabilities. Elements should be "birth_death", "change", and "swap" to control tree change proposals, and "birth" to give the relative frequency of birth/death in the "birth_death" step. |
sigmadbarts |
(dbarts option) A positive numeric estimate of the residual standard deviation. If NA, a linear model is used with all of the predictors to obtain one. |
print.opt |
Print every print.opt number of Gibbs samples. |
quiet |
Does not show progress bar if TRUE |
outcome_draws |
If TRUE, output draws of the outcome. |
The following objects are returned:
mutrain_draws |
Matrix of MCMC draws of expected values for training observations. Number of rows equal to the number of training observations multiplied by the number of outcomes. The rows are ordered by beginning with all N (units') observations for the first outcome variable, then all N for the second outcome variable and so on. Number of columns equals n.iter. |
mutest_draws |
Matrix of MCMC draws of expected values for test observations. Number of rows equal to the number of test observations multiplied by the number of outcomes. The rows are ordered by beginning with all Ntest (units') observations for the first outcome variable, then all Ntest for the second outcome variable and so on. Number of columns equals n.iter. |
ytrain_draws |
Matrix of MCMC draws of outcomes for training observations. Number of rows equal to the number of observations multiplied by the number of outcomes. The rows should be ordered by beginning with all N (units') observations for the first outcome variable, then all N for the second outcome variable and so on. Number of columns equals n.iter. |
ytest_draws |
Matrix of MCMC draws of outcomes for training observations. Number of rows equal to the number of observations multiplied by the number of outcomes. The rows should be ordered by beginning with all N (units') observations for the first outcome variable, then all N for the second outcome variable and so on. Number of columns equals n.iter. |
Sigma_draws |
3 dimensional array of MCMC draws of the covariance matrix for the outcome-specific error terms. The numbers of rows and columns equal are equal to the number of outcome variables. The number of slices is |
library(foreign)
library(systemfit)
hsb2 <- read.dta("https://stats.idre.ucla.edu/stat/stata/notes/hsb2.dta")
train_inds <- sample(1:nrow(hsb2),size = 180, replace = FALSE)
test_inds <- (1:nrow(hsb2))[-train_inds]
hsb2train <- hsb2[train_inds,]
hsb2test <- hsb2[test_inds,]
r1 <- read~female + as.numeric(ses) + socst
r2 <- math~female + as.numeric(ses) + science
fitsur <- systemfit(list(readreg = r1, mathreg = r2), data=hsb2train)
linSURpreds <- predict(fitsur, hsb2test)
sqrt(mean((linSURpreds$readreg.pred-hsb2test$read)^2))
sqrt(mean((linSURpreds$mathreg.pred-hsb2test$math)^2))
xtrain <- list()
xtrain[[1]] <- cbind(hsb2train$female, as.numeric(hsb2train$ses), hsb2train$socst)
xtrain[[2]] <- cbind(hsb2train$female, as.numeric(hsb2train$ses), hsb2train$science)
xtest <- list()
xtest[[1]] <- cbind(hsb2test$female, as.numeric(hsb2test$ses), hsb2test$socst)
xtest[[2]] <- cbind(hsb2test$female, as.numeric(hsb2test$ses), hsb2test$science)
ylist <- list()
ylist[[1]] <- hsb2train$read
ylist[[2]] <- hsb2train$math
surbartres <- surbart_eqbyeq(x.train = xtrain, #either one matrix or list
x.test = xtest, #either one matrix or list
y = ylist,
2,
nrow(hsb2train),
nrow(hsb2test),
n.iter = 1000,
n.burnin = 100)
readpredsurbart <- apply(surbartres$mutest_draws[,,1],2,mean)
mathpredsurbart <- apply(surbartres$mutest_draws[,,2],2,mean)
sqrt(mean((readpredsurbart -hsb2test$read)^2))
sqrt(mean((mathpredsurbart- hsb2test$math)^2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.