Re-running the whole sampling analysis takes at least a week and requires a supercomputer. If you would like to just load the results from the analysis for exploration:
diazrenata/scadsanalysis
or the Zenodo archive.analysis/helper_scripts/load_results_for_exploring.R
.all_di
. This is the main
dataframe for results. It has a lot of columns, descriptions for
which are below.If you would like to re-create the figures for the manuscript, you can
re-render the reports in analysis/reports/submission2
.
manuscript_main_rev.Rmd
is the figures and tables in the main text,
and the reports in the appendices
folder render each of the
supplementary documents.
You will need to download and install the package for this research compendium, as well as packages to run the analytical workflow and optionally HPC.
remotes::install_github("diazrenata/scadsanalysis")
install.packages("drake")
install.packages("DBI")
install.packages("storr")
install.packages("clustermq")
library(scadsanalysis)
To download and prep the main datasets:
download_data()
To subsample the main datasets (subsampling was added after the core
analyses were completed), render analysis/helper_scripts/jacknife.Rmd
.
You may need to create a directory called
analysis/rev_prototyping/jacknifed_datasets
.
For sampling, you will need tables listing the number of possible SADs
for combinations of S and N. This analysis uses 3 tables covering
different swaths of SxN space. The first two can be created by running
analysis/helper_scripts/make_p_wide.R
and
analysis/helper_scripts/make_p_mamm.R
, and the largest one by running
analysis/helper_scripts/ptables_plan.R
(RMD ran this on a HPC cluster
using submit_p_pipeline.sbatch
). The files are too large for GitHub,
but will be uploaded to Zenodo and made available for download there.
To run a subset locally, you can just run
analysis/helper_scripts/make_p_mamm.R
or download masterp_mamm.Rds
to the analysis
directory.
Each dataset is run in its own (identical) pipeline, and we collect the results at the end. Either source one of the pipelines directly or run via sbatch. Running one of the pipeline files locally is not recommended.
To run on the HiperGator, ssh in and run:
sbatch submit_bbs_pipeline.sbatch
, substituting whatever dataset name
you want to run.
Running on a different HPC cluster would take some setup. On the
HiPerGator, we’re using drake
and SQLite caches to manage the
pipelines, and clustermq
to handle parallelization with the SLURM
scheduler.
To run a subset locally, you can run mcdb_pipeline_demo.R
. This will
run the pipeline on the first 100 communities in the Mammal Community
Database. It takes [time] on a MacBook Air with 8GB memory and creates
a [size] cache file.
Run analysis/helper_scripts/make_all_di.R
,
analysis/helper_scripts/make_all_di_jk.R
,
analysis/helper_scripts/make_all_ct.R
and
analysis/helper_scripts/make_all_ct_jk.R
to collect the results from
each of the pipelines into dataframes. If you ran the pipelines on a HPC
cluster, run these scripts and copy the .csvs they create (currently
stored at analysis/reports/submission2/all_di.csv
, all_di_jk.csv
,
etc) to your local computer for viewing.
For the local subset, you can run
analysis/helper_scripts/make_all_di_demo.R
. This will create the
analogous results files for just the subset you ran.
If you want to explore the results (and not re-run the actual analyses) these files are stored in GitHub/Zenodo at the paths above. You can just download the archive and work from these .csvs.
RMarkdown files for generating all of the figures and tables in the
manuscript & supplements are at analysis/reports/submission2
.
manuscript_main_rev.Rmd
renders the figures in the main text, and the
.Rmd files in analysis/reports/submission2/appendices
renders each of
the supplementary documents. Once you have run the make_all_X
scripts,
or downloaded the results files, you can render any of these reports.
To load the results for further exploration, you can start from the code
in analysis/helper_scripts/load_results_for_exploring.R
. This is the
same setup snippet as begins the .Rmd files.
all_di
will be your main dataframe for exploring. It has a lot of
columns, here is what they mean:
column_descriptions <- read.csv(here::here("analysis", "reports", "submission2", "all_di_columns.csv" ))
column_descriptions
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.