R2GUESS: Wrapper function that reads the input files and parameter...
In R2GUESS: Wrapper Functions for GUESS

Description Usage Arguments Details Value Author(s) See Also Examples

The R2GUESS function reads and compiles data, input files and parameters that are required to run GUESS source code. It automatically runs GUESS (enabling or not the GPU capacity), saves the results and summary files in text files. For portability, R2GUESS generates an ESS object which compiles information about the input and parameters used to run GUESS, and outputs as detailed in as.ESS.object.

  R2GUESS(dataY, dataX, path.input, path.output, path.par,
    path.init = NULL, file.par, file.init = NULL,
    file.log = NULL, nsweep, burn.in, Egam,
    Sgam, root.file.output, time = TRUE, top = 100,
    history = TRUE, label.X = NULL, label.Y = NULL,
    choice.Y = NULL, nb.chain, conf = NULL, cuda = TRUE,
    MAP.file = NULL, time.limit=NULL,seed=NULL)

`dataY`	either a one element character vector (such as '`dataY.txt`') or a data frame. If `dataY` is entered as a character vector, it specifies, assuming that data are in the `path.input` folder, the location of the response matrix. In the corresponding file observations are presented in rows, and the (possibly multivariate) outcome(s) in columns. The first two rows (single integers) represent the number of rows (`n`) and columns (`q`) in the matrix. If a data frame argument is passed, it links to a `nxq` numerical matrix compiling the observed responses.
`dataX`	either a one element character vector (such as '`dataX.txt`') or a data frame. If `dataX` is entered as a character vector, it specifies, assuming that data are in the `path.input` folder, the location of the predictor matrix. In the corresponding file observations are presented in rows, and the predictors in columns. The first two rows (single integers) represent the number of rows (`n`) and columns (`p`) in the matrix. If a data frame argument is passed, it links to a `nxq` numerical matrix compiling the observed predictors.
`path.input`	path linking to the directory containing the data (`dataX` and `dataY`). If `dataX` or/and `dataY` have been entered as data frame(s), the function will generate the corresponding text files required to run `GUESS` in the `path.input` folder.
`path.output`	path indicating the directory in which output files will be saved.
`path.par`	path indicating the directory in which to find the parameter file needed to run `GUESS`.
`path.init`	path indicating the location of the file describing the initial guess of the MCMC procedure (i.e. the variables to include in the initial model).
`file.par`	name of the parameter file containing all user-specified parameters required to set up the run and the features of the moves. This file is located in `path.par` and contains fields that are extensively described in http://www.bgx.org.uk/software/GUESS_Doc_short.pdf. These parameters are not mandatory and, if not specified, they will be set to their default values, also given in documentation. An example of this file is provided in the package.
`file.init`	name of the file specifying which variables to include at the first iteration of the MCMC run. The first row of the file is a single scalar representing the number of rows (# variables to include). Subsequent rows indicate the position of the covariates to include. This file is optional and if not specified (`default=NULL`), initial guesses of the MCMC algorithm will be derived from a step-wise regression approach.
`file.log`	name of the log file. This file compiles in real time summary information describing the initial parameters, the computational time and state of the run. This file will also contain information about moves sampled at each sweep. By default (=`NULL`), the name is given by the argument `root.file.output` extended by `'_log'` and for computational efficiency (especially when GPU is enabled), a minimal amount of information is returned.
`nsweep`	integer specifying the number of sweeps for the MCMC run (including the burn-in).
`burn.in`	integer specifying the number of sweeps to be discarded to account the burn-in.
`Egam`	numeric representing the 'a priori' average model size.
`Sgam`	numeric representing the 'a priori' standard deviation of the model size.
`root.file.output`	name specifying the file stem for writing the output files in the directory specified by `path.output`.
`time`	Boolean value. When `time=TRUE` (default value) a file recording the time each sweep took will be created and saved in `path.output` directory.
`top`	number of top models to be reported in the output. The default value is 100.
`history`	Boolean value. When `history=TRUE` (default value), a number of additional output files that record the history of each move is provided. See section 5 of http://www.bgx.org.uk/software/GUESS_Doc_short.pdf for more details.
`label.X`	a character vector specifying the name of the predictors. If not specified (=NULL), variables are labelled by their position in the matrix. Predictors name and information is given in the `MAP.file` in the case of SNP data (field `SNPName`).
`label.Y`	a character vector specifying the name of the outcomes. If not specified (=NULL), the outcomes are labelled Y1,..Yq, where q is the number of columns in the outcome matrix or will be named after the argument `dataY` (if specified by a data frame).
`choice.Y`	a character vector or a numeric vector specifying which phenotypes in the response matrix `dataY` to analyse in a joint model. By default, all phenotypes in the response matrix will be considered.
`nb.chain`	an integer specifying the number of chains to consider in the evolutionary procedure.
`conf`	either a one element character vector (such as '`conf.txt`') or a data frame. If `conf` is entered as a character vector, it specifies, assuming that data are in the `path.input` folder, the location of the confounder matrix. In the corresponding file observations are presented in rows, and the values for the confounders in columns. The first two rows (single integers) represent the number of rows (`n`) and columns (`k`) in the matrix. If a data frame argument is passed, it links to a `nxk` numerical matrix compiling the observed confounders. If specified, the function will substitute the response matrix by the residuals from the linear model regressing the confounders against the outcomes.
`cuda`	a boolean value. cuda=TRUE redirects linear algebra operations towards the GPU. On non-CULA compatible platforms, this option will be ignored.
`MAP.file`	either a one element character vector or a data frame. If a character vector is used, it specifies, assuming that data are in the `path.input` folder, the location of the annotation file. In the corresponding file, predictors are presented in rows, and are described as a `MAP.file`. If a data frame argument is passed, it links to a `px3` matrix.
`time.limit`	a numerical value specifying the maximum computing time (in hours) for the run. If the run exceeds that value, modelling options, parameters value, state of the pseudo random number generator, and state of each chain will be saved to enable to resume the run exactly at the same point it was interrupted (using `resume` option). By default (=`NULL`) the run will go on until its completion.
`seed`	a integer specifying the random seed used to initialize the pseudo-random number generator. If not specified, the seed will be initialised using the CPU clock.

For any of the dataX, dataY parameters, if a data frame argument is passed, a text file labelled data-*-C-CODE.txt will be created in the path.input directory. If conf is specified, and additional files representing the adjusted responses will be created according to the file labelling system.This file will be formatted to have the suitable structure to be read by the C++ code: individuals presented in rows, and observations in columns, with the first two rows indicating the number of rows and columns in the matrix. The returned ESS object will include all result files produced by the code. The number and type of outputs produced depend on the running options chosen. A full description of the available output can be found in http://www.bgx.org.uk/software/GUESS_Doc_short.pdf

An object of class ESS containing information listed in as.ESS.object. The object can subsequently be used to post-process the results using provided R functions (such as summary.ESS, plotMPPI, plot.ESS).

Benoit Liquet, b.liquet@uq.edu.au,
Marc Chadeau-Hyam m.chadeau@imperial.ac.uk,
Leonardo Bottolo l.bottolo@imperial.ac.uk,
Gianluca Campanella g.campanella11@imperial.ac.uk

as.ESS.object, summary.ESS,as.ESS.object, plotMPPI, plot.ESS

## Not run: 
path.input <- system.file("Input", package="R2GUESS")
path.output <- tempdir()
path.par <- system.file("extdata", package="R2GUESS")
file.par.Hopx <- "Par_file_example_Hopx.xml"
#you can have a look of the parameter file in
print(paste(path.par,file.par.Hopx,sep=""))
##To reach convergence you may need to increase nsweep=110000 and the burn.in=10000
## RUNNING is APPROX 5 minutes
root.file.output.Hopx <- "Example-GUESS-Y-Hopx"
label.Y <- c("ADR","Fat","Heart","Kidney")
data(data.Y.Hopx)
data(data.X)
data(MAP.file)

modelY_Hopx<-R2GUESS(dataY=data.Y.Hopx,dataX=data.X,choice.Y=1:4,
label.Y=label.Y,,MAP.file=MAP.file,file.par=file.par.Hopx,file.init=NULL,
file.log=NULL,root.file.output=root.file.output.Hopx,path.input=path.input,
path.output=path.output,path.par=path.par,path.init=NULL,nsweep=11000,
burn.in=1000,Egam=5,Sgam=5,top=100,history=TRUE,time=TRUE,
nb.chain=3,conf=NULL,cuda=FALSE)

summary(modelY_Hopx,20) # 20 best models

print(modelY_Hopx)

## End(Not run)