knitr::opts_chunk$set(echo = TRUE)
Re-running the whole sampling analysis takes at least a week and requires a supercomputer. If you would like to just load the results from the analysis for exploration:
diazrenata/scadsanalysis
or the Zenodo archive.analysis/helper_scripts/load_results_for_exploring.R
. all_di
. This is the main dataframe for results. It has a lot of columns, descriptions for which are below. If you would like to re-create the figures for the manuscript, you can re-render the reports in analysis/reports/submission2
. manuscript_main_rev.Rmd
is the figures and tables in the main text, and the reports in the appendices
folder render each of the supplementary documents.
The full analytical pipeline can be replicated by installing scadsanalysis
as an R package and running a series of scripts. It requires large amounts of memory and storage (terabytes), and needs parallelization to run in a reasonable amount of time. We run it on the UF HiPerGator, and you'd need to change the settings to work with a different cluster.
For those interested in replicating a subset of the analyses with less compute, we have included instructions for running a subset locally.
You will need to download and install the package for this research compendium, as well as packages to run the analytical workflow and optionally HPC.
remotes::install_github("diazrenata/scadsanalysis") install.packages("drake") install.packages("DBI") install.packages("storr") install.packages("clustermq") # Only if you are running on HPC
library(scadsanalysis)
To download and prep the main datasets:
download_data()
To subsample the main datasets (subsampling was added after the core analyses were completed), render analysis/helper_scripts/jacknife.Rmd
. You may need to create a directory called analysis/rev_prototyping/jacknifed_datasets
.
For sampling, you will need tables listing the number of possible SADs for combinations of S and N. This analysis uses 3 tables covering different swaths of SxN space. The first two can be created by running analysis/helper_scripts/make_p_wide.R
and analysis/helper_scripts/make_p_mamm.R
, and the largest one by running analysis/helper_scripts/ptables_plan.R
(RMD ran this on a HPC cluster using submit_p_pipeline.sbatch
). The files are too large for GitHub, but will be uploaded to Zenodo and made available for download there.
To run a subset locally, you can just run analysis/helper_scripts/make_p_mamm.R
or download masterp_mamm.Rds
to the analysis
directory.
Each dataset is run in its own (identical) pipeline, and we collect the results at the end. Either source one of the pipelines directly or run via sbatch. Running one of the pipeline files locally is not recommended.
To run on the HiperGator, ssh in and run:
sbatch submit_bbs_pipeline.sbatch
, substituting whatever dataset name you want to run.
Running on a different HPC cluster would take some setup. On the HiPerGator, we're using drake
and SQLite caches to manage the pipelines, and clustermq
to handle parallelization with the SLURM scheduler.
To run a subset locally, you can run mcdb_pipeline_demo.R
. This will run the pipeline on the first 50 communities in the Mammal Community Database, drawing up to 200 samples from each feasible set. It takes about 20 minutes on a MacBook Air with 8GB memory and creates a 51 MB cache file.
Run analysis/helper_scripts/make_all_di.R
, analysis/helper_scripts/make_all_di_jk.R
, analysis/helper_scripts/make_all_ct.R
and analysis/helper_scripts/make_all_ct_jk.R
to collect the results from each of the pipelines into dataframes. If you ran the pipelines on a HPC cluster, run these scripts and copy the .csvs they create (currently stored at analysis/reports/submission2/all_di.csv
, all_di_jk.csv
, etc) to your local computer for viewing.
For the local subset, you can run analysis/helper_scripts/make_all_di_demo.R
. This will create the analogous results files for just the subset you ran.
If you want to explore the results (and not re-run the actual analyses) these files are stored in GitHub/Zenodo at the paths above. You can just download the archive and work from these .csvs.
RMarkdown files for generating all of the figures and tables in the manuscript & supplements are at analysis/reports/submission2
. manuscript_main_rev.Rmd
renders the figures in the main text, and the .Rmd files in analysis/reports/submission2/appendices
renders each of the supplementary documents. Once you have run the make_all_X
scripts, or downloaded the results files, you can render any of these reports.
To load the results for further exploration, you can start from the code in analysis/helper_scripts/load_results_for_exploring.R
. This is the same setup snippet as begins the .Rmd files.
all_di
will be your main dataframe for exploring. It has a lot of columns, here is what they mean:
column_descriptions <- read.csv(here::here("analysis", "reports", "submission2", "all_di_columns.csv" )) column_descriptions
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.