spavs: Posterior sampling of configurations of sets of responses and...

View source: R/spavs.R

spavsR Documentation

Posterior sampling of configurations of sets of responses and associated regressors.

Description

spavs is the main function of the package. It runs a Markov chain or multiple chains corresponding either to different temperatures when parallel tempering is used or to different initializations when tempering is not used. At each iteration, a move is proposed that is chosen with equal probability among those allowed. A move can be a merge or a split move of one of two kinds (type 1 and of type 2). The details of the algorithm are described in the paper by S. Monni and M. G. Tadesse (2009). As far as possible, the input statistical parameters of the spavs function follow the notation in that paper. This function does not return any R objects. Instead the MCMC output (i.e., the sampled configurations) and other information are written into output files as detailed below. If needed for analysis, the sampled configurations can then be loaded in R from these output files as lists, employing the function conf.file.to.list. Estimates of posterior probabilities can be computed with the function pairwise.probs.

Usage

spavs(
  Y,
  X,
  h0,
  h,
  nu,
  sigma0.square,
  rho,
  alpha0 = NULL,
  beta0 = NULL,
  p.split = 1/2,
  tempering = FALSE,
  inv.temp = 1,
  tchains = 1,
  tswaps = 0,
  mws = 30000,
  parallel.params = list(cores = 1, preschedule = TRUE, affinity.list = NULL),
  init.conf = c("random", "predetermined", "data"),
  init.conf.params,
  thin = 200,
  outputfile = "output",
  print.monitoring = FALSE
)

Arguments

Y

the matrix of outcomes: an n*q matrix, with q the number of reponses and n the number of samples (the transpose of the matrix Y in the paper).

X

the matrix of predictors: an n*p matrix, with p the number of predictors (the transpose of the matrix X in the paper).

h0, h

hyperparameters that control the variance of the normal priors of the regression coefficients. h0 controls the variance of the intercepts and h the variance of the regression coefficients of the regressors X.

sigma0.square, nu

the shape and scale hyperparameters for the inverse-Gamma prior of the component residuals variances sigma_k^2. The prior mode is nu/(sigma0.square+1).

rho

a real number 0< rho < 1. It controls the prior probability: P(C)=rho^{m*n}, for an (m, n) configuration C.

alpha0

a length-q numeric vector. If NULL (default) the entries are set to 0. It is the mean of the normal prior of the intercepts of the mean function of (Y_1, …, Y_q).

beta0

a length-p numeric vector. If NULL (default) the entries are set to 0. It is the mean of the normal prior of the vector of the regression coefficients for predictors (X_1, …, X_p).

p.split

controls the number of common predictors in the components resulting from a split move of type 2. p.split must be in the interval (0, 1). If the component to be split has m regressors, the number of common regressors in the two resulting components is k, with probability, when m>0,

P(k)=p.split/(1+\floor(m/2)),

for k=0,1, …, \floor(m/2)

P(k)=(1-p.split)/(m-\floor(m/2)),

for k=\floor(m/2), …, m, and, when m=0, k=0 with probability 1.

tempering

TRUE/FALSE specifies whether or not tempering is to be employed.

inv.temp

a numeric vector of either length 1 or length equal to tchains. It is considered only when tempering=TRUE. If it has length 1, it must be a positive number in the interval (0, 1/tchains], in which case it represents the difference between inverse temperatures of the chains, so that the tempering schedule is 1/T_i= 1- (i-1)*inv.temp, for i=1, …, tchains. If it has length equal to tchains, it represents the vector of the inverse temperatures in the chains: one entry must then be equal to one, and the others must be unique values in (0, 1). The program will order the entries in decreasing order.

tchains

an integer greater than 0 indicating the number of chains. If tempering=TRUE the chains are run at different temperatures, otherwise no tempering swaps are attempted.

tswaps

total number of temperature swaps (moves that exchange configurations at different temperatures), when tempering=TRUE in which case it must be >=1. The parameter can be used even when tempering=FALSE and no temperature swaps occur. The total number of iterations in each chain is given by (tswaps+1)*mws.

mws

number of moves in each chain within attempts at temperature-transfers. If tempering is not implemented, one can either set tswaps=0, so that mws is the number of iterations, or one can use tswaps >0, in which case the total number of iterations is given by (tswaps+1)*mws. This latter option is preferable when one wants to monitor the acceptance rates, which, with the option print.monitoring=TRUE, will appear on the screen.

parallel.params

a list with three entries, cores, preschedule, affinity.list. They are the parameters for the parallelization, with different chains run in parallel. The code is parallelized with the function parallel::mclapply and these three values correspond to mc.cores, mc.preschedule, and affinity.list in that function. See mclapply. In particular, cores is the number of cores to use. Ideally it should be equal to tchains, and there is no gain in selecting it larger than tchains. If preschedule and affinity.list are not specified, the default values of mclapply are used. All other parameters in mclapply are the default ones.

init.conf

one of the strings: "random", predetermined", "data". If init.conf="random", the initial configuration is randomly chosen, with the initial number of groups specified in init.conf.params. If init.conf="predetermined", the initial configuration is loaded from a file, specified in init.conf.params. The file must have one configuration per line and the number of lines must be equal to the number of tempering chains. Each initial configuration, one per tempering chain, must appear in the same form as the software output: e.g., 5 , 3 ; 7 , 2 ; , 1 ; see below the description of the output files. There is no check that the file is correct other than possible errors thrown by the function used to read the file, which is the function conf.file.to.list. If init.conf="data", the starting configuration (the same for all chains) is determined using k-means on the Y data. This option is available only when the number of responses q ≥ 3.

init.conf.params

If init.conf="random", this parameter must be an integer or a vector of integers of length equal to tchains (number of tempering chains). It denotes the number of components of the initial configuration in each chain. If more than one chain is run, and the vector has length 1, all initial configurations have the same number of components equal to this integer, but they are chosen randomly. If init.conf="predetermined", it must be the name of the file that contains the initial configuration(s) from which to start. If init.conf="data", this parameter is ignored.

thin

an integer indicating the thinning used for the sampler (namely, the configuration of the chain at T=1 are output every thin iterations). If one wants to have any sampled configurations saved in the file, choose thin such that thin mws.

outputfile

a string, the prefix to the names of the 3 files where the sampled configurations are written.

print.monitoring

TRUE/FALSE. If TRUE, print on screen the acceptance rates after each temperature swap.

Value

none. The routine writes the samples and other information in files and does not save R objects. Use conf.file.to.list to load the saved configurations into R in the form of lists. There are three types of output files, all with the same prefix specified in the option outputfile. They are:

   -"outputfile_tk_conf.txt" will contain the sampled configurations
    of the k-th chain.  If 'tempering=TRUE', only one file for T=1 is
    printed.
    Each configuration is in one line.
    Components are separated by a semicolon. A comma separates
    the indices of the regressors and responses of a component.
    E.g., the line:  5 , 3 ; 5 7 , 2 ; , 1 ; represents:
    C1= X_5, Y_3;  C2=X_5, X_7, Y_2;   C3=Y_1.

   -"outputfile_logpost_size.txt",  a two-column file
     with each row having logpost and size of the sampled configuration 
     saved in the corresponding row of the "outputfile_t_k_conf.txt" file.

   -"outputfile_ar.txt" tabulates the acceptance rates for the moves in 
     all chains and, with tempering, for the swaps between thermal chains.
     Each row  has the number of attempted and accepted moves of a specific
     type, for all different chains in order of increasing temperature 
     (the first column being thus the T_1=1 chain).
     The last row gives proposed and accepted numbers of swaps between 
     contiguous tempering chains, in order of increasing temperature: 
     the first two columns refer to the swap between the T_1=1 and T_2
     chains, the second two to the swap between the T_2 and T_3 chains... 

References

S. Monni, M. G. Tadesse, A Stochastic Partitioning Method to Associate High-dimensional Responses and Covariates, Bayesian Analysis (2009) 4, pp. 413-436. https://doi.org/10.1214/09-BA416

Examples

# four chains at temperatures 1, 0.9, 0.85, 0.81, 
# with a parallelization using 4 cores  
spavs(Y=Y, X=X, h0=2, h=2, nu=1, sigma0.square=3, rho=0.3, tchains=4, tempering=TRUE, inv.temp=c(1,0.9, 0.85, 0.81), init.conf="random", init.conf.params=c(3,3,1,4), parallel.params=list(cores=4))


# four chains at the same temperature T=1 (no tempering) 
# with a parallelization using 4 cores.
# this is equivalent to 4 different runs
spavs(Y=Y, X=X, h0=2, h=2, nu=1, sigma0.square=3, rho=0.3, tchains=4, tempering=FALSE, init.conf="random", init.conf.params=c(3,3,1,4), parallel.params=list(cores=4))


# one  chain 
spavs(Y=Y, X=X, h0=2.2, h=2, nu=2, sigma0.square=1.3, rho=0.01, p.split=0.8, tempering=FALSE, tchain=1, init.conf="data", outputfile="run1")


stefanomonni/spavs documentation built on April 5, 2022, 1:26 a.m.