Performing a bootstrap analysis with qtl2pleio
requires first sampling a (bivariate) phenotype from the inferred bivariate normal distribution, then a two-dimensional, two-QTL QTL scan over the defined scan region. Often, one wants to include more than 100 markers in a scan region. Each two-dimensional scan can take tens of minutes with a single core. Thus, performing a bootstrap analysis with, say, 1000 bootstrap samples is prohibitively time-consuming on most computers.
For this reason, we used a high-throughput computing cluster at the University of Wisconsin-Madison (where qtl2pleio
was developed). The UW-Madison Center for High-Throughput Computing (CHTC) is available to all UW-Madison researchers. While we recognize that qtl2pleio
users may not have access to the CHTC, we anticipate that some will have access to computing clusters. Hopefully this vignette can be adapted to your needs.
We placed on github a repository that contains the results from our bootstrap analysis of the Recla data.
Here is the repository's url: https://github.com/fboehm/qtl2pleio-manuscript-chtc/
Within this repo are three subdirectories. We'll examine in detail the "Recla-bootstrap" subdirectory.
Within this directory, we created four subdirectories:
The Github repository also contains a fifth subdirectory, squid
and a sixth subdirectory results
, that are included for completeness. In practice, files in squid
were in a distinct directory. We describe it below.
Rscript
The Rscript
subdirectory contains a file, boot-Recla-10-22.R
that contains R code for our bootstrap analysis.
fn <- file.path("https://raw.githubusercontent.com", "fboehm/qtl2pleio-manuscript/master/chtc", "Recla-bootstrap/Rscript/boot-Recla-10-22.R" ) foo <- readLines(fn)
Here we display the contents of our boot-Recla-10-22.R
file.
cat(foo, sep = "\n")
data
The data subdirectory contains three rds files, one for each of the three input components.
shell_scripts
We examine only the file that is needed for the Recla analysis, boot-Recla-10-22.sh
.
Here is the contents of the file.
```{bash, eval = FALSE}
tar -xzf R13.tar.gz tar -xzf SLIBS.tar.gz
export PATH=$(pwd)/R/bin:$PATH export LD_LIBRARY_PATH=$(pwd)/SS:$LD_LIBRARY_PATH
R CMD BATCH '--args argname='$1' nsnp='$2' s1='$3' nboot_per_job='$4' run_num='$5'' Rscript/boot-Recla-10-22.R 'boot400_run'$5'-'$1'.Rout'
We need to unzip the R installation, which we've named R13.tar.gz, and SLIBS file on the remote computer where the actual computing will occur. Following that, and needed adjustments to the paths, we execute R. Note that we have, in this file, specified the variable values by numbers, like `$1` for `argname`. The values assigned to each of these is specified in the "submit" file. ### Files in `submit_files` The HTCondor "submit" file provides instructions for controlling the interaction with the high-throughput resources. Below is the text in the submit file `boot-Recla-10-22.sub` ```{bash, eval = FALSE} # hello-chtc.sub # My very first HTCondor submit file s1=650 # start scan at this SNP index value nsnp=350 # length of scan in number of SNPs # which run in the experimental design? run_num=561 # nboot_per_job=1 # for almost all jobs), the desired name of the HTCondor log file, # and the desired name of the standard error file. # Wherever you see $(Cluster), HTCondor will insert the queue number # assigned to this set of jobs at the time of submission. universe = vanilla log = boot-$(Process)-run$(run_num).log error = boot-$(Process)-run$(run_num).err # # Specify your executable (single binary or a script that runs several # commands), arguments, and a files for HTCondor to store standard # output (or "screen output"). # $(Process) will be a integer number for each job, starting with "0" # and increasing for the relevant number of jobs. executable = ../shell_scripts/boot-Recla-10-22.sh arguments = $(Process) $(nsnp) $(s1) $(nboot_per_job) $(run_num) output = boot-$(Process)-run$(run_num).out # # Specify that HTCondor should transfer files to and from the # computer where each job runs. The last of these lines *would* be when_to_transfer_output = ON_EXIT transfer_input_files = ../data,../Rscript,../shell_scripts,http://proxy.chtc.wisc.edu/SQUID/fjboehm/R13.tar.gz,http://proxy.chtc.wisc.edu/SQUID/SLIBS.tar.gz # # Tell HTCondor what amount of compute resources # each job will need on the computer where it runs. request_cpus = 1 request_memory = 3GB request_disk = 1GB # requirements = (OpSysMajorVer == 6) || (OpSysMajorVer == 7) # which computer grids to use: +WantFlocking = true +WantGlideIn = true # Tell HTCondor to run instances of our job: queue 1000
The ordering of the arguments on in the line that starts with arguments =
dictates the numbers that are assigned to each argument. These numbers are used in the shell script, above.
SQUID
The UW-Madison CHTC provides for each user disk space on a web proxy. More information about SQUID
is available online.
In practice, we placed files containing founder allele dosages (for chromosomes of interest) and our (compressed) R installation on SQUID
.
We followed the usual procedure for submitting R jobs with the CHTC. To submit the file boot-Recla-10-22.sub
, we first changed into its directory. We then typed
```{bash, eval = FALSE} condor_submit boot-Recla-10-22.sub
By using this command from within the `submit_files` subdirectory, we ensure that our outputted files are returned to the `submit_files` subdirectory. We then manually moved the outputted results files to the `results` subdirectory. We found that up to ten percent of jobs would initially fail and require re-submission. There are a variety of reasons that may explain the need for re-submission. Because of this need, we include in the git repository the file `boot-Recla-10-22-fix.sub`, which gives the code that we used for resubmission. The file `bad-jobs-run561` gives the ids of the jobs that failed the initial runs. ## Session info ```r devtools::session_info()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.