In wlandau/fbseqComputation: fbseqComputation

fbseqComputation

[fbseqComputation package](https://github.com/wlandau/fbseqComputation) package is a smaller version of the fbseqStudies package package, created becase fbseqStudies package reveals too much too soon. [fbseqComputation package](https://github.com/wlandau/fbseqComputation) package reproduces the results of Section 5 ("Assessing computational tractability") of a paper entitled "A fully Bayesian strategy for high-dimensional hierarchical modeling using massively parallel computing" by Will Landau and Dr. Jarad Niemi. The goal is to assess the computational tractability of fbseq package and fbseqCUDA package using a real RNA-seq dataset [@paschold] and a simulation study based off this dataset. See the original paper for details.

System requirements

The R version and R packages listed in the "Depends", "Imports", and "Suggests" fields of the "package's DESCRIPTION file.
A CUDA-capable NVIDIA graphics processing unit (GPU) with compute capability 2.0 or greater. More information about CUDA is available through NVIDIA.
CUDA version 6.0 or greater.
Optional: the code uses double precision values for computation, so GPUs that natively support double precision will be much faster than ones that do not.

Installation

Option 1: install a stable release (recommended).

Navigate to a list of stable releases on the project's GitHub page. Download the desired tar.gz bundle, then install it either with install.packages(..., repos = NULL, type="source") from within R R CMD INSTALL from the Unix/Linux command line.

Option 2: use `install_github` to install the development version.

For this option, you need the devtools package, available from CRAN or GitHub. Open R and run

library(devtools)
install_github("wlandau/fbseqComputation")

Option 3: build the development version from the source.

Open a command line program such as Terminal in Mac/Linux and enter the following commands.

git clone git@github.com:wlandau/fbseqComputation.git
R CMD build fbseqComputation
R CMD INSTALL ...

where ... is replaced by the name of the tarball produced by R CMD build.

Run `paper_computation()` to reproduce the analysis

The function paper_computation() reproduces the computational results of Section 5 (Assessing computational tractability) of "A fully Bayesian strategy for high-dimensional hierarchical modeling using massively parallel computing" by Will Landau and Dr. Jarad Niemi. Internally, paper_computation() runs the following functions.

real_mcmc(), which runs the Markov chain Monte Carlo procedure on the @paschold dataset and outputs the raw computation to a folder called real_mcmc. This function should take a few hours.
computation_mcmc(), which runs the Markov chain Monte Carlo procedure on the simulated datasets described in Section 5.1 ("The scaling of performance with the size of the data"), outputting the raw computation to a folder called computation_mcmc. This function should take a few days to run.
real_analyze(), which uses the raw computation output in the real_mcmc folder to extract the results of Section 5 pertaining to the @paschold dataset and put them in a folder called real_analyze. Gelman-Rubin potential scale reduction factors are in real_analyze/gelman/gelman.rds, effective sample size information is in an RDS file in real_analyze/ess/ess.rds, and a table of runtimes is in an RDS file in real_analyze/runtime/runtime.rds. RDS files are raw R objects, which can be read into R with the readRDS() function.
computation_analyze(), which uses the raw computation output in the computation_mcmc folder to extract the results of Section 5 pertaining to the simulated datasets and put them in a folder called computation_analyze. Gelman-Rubin potential scale reduction factors are in computation_analyze/gelman/gelman.rds, effective sample size information is in an RDS file in computation_analyze/ess/ess.rds, and a table of runtimes is in an RDS file in computation_analyze/runtime/runtime.rds. RDS files are raw R objects, which can be read into R with the readRDS() function.

For further control during the computation, the user may run any of the above functions individually. real_mcmc() and computation_mcmc() do not have to be run completely through in a single run, as they do not overwrite existing completed Markov chain Monte Carlo (MCMC) runs in the real_mcmc and computation_mcmc folders, respectively. To see which MCMCs analyses are completed so far, run progress("real_mcmc") or progress("computation_mcmc"). For even finer-grained control, see the manual and help files.