In order to run R code on the cluster of the University of Bern, please follow the instructions on the UBELIX help page. There is a great documentation (also about using R on the cluster).
- get user profile
- log in with ssh <username>@submit.unibe.ch
- install R and the required packages
- run the examples
module load vital-it module avail 2>&1 | grep " R\/" # check if your R version is there, e.g. 3.6.1 for me # module load R/3.6.1 module load R/latest # install packages R # open R and within R, type : install.packages("gdm") # accept using personal library # select CRAN quit() # check how far the job is squeue -u <username>
model results need to be uploaded :
ssh <username>@submit.unibe.ch scp /path/to/file <username>@submit.unibe.ch:path/to/target_dir/ scp <username>@submit.unibe.ch:path/to/file /path/to/target_dir/
source : https://docs.id.unibe.ch/ubelix/getting-started
2 scripts are required : an R script which contains the R code and a SLURM script which contains the cluster code.
- R script : gdm.R
- cluster script : Rbatch.sh
To run parallel, you need to give the number of cores in (1) the slurm script and (2) the R script. Sys.getenv("SLURM_CPUS_PER_TASK")
can be used.
gdm.R
name <- "gdm_Groundwater.recharge_LUI" gdm_model <- readRDS(paste("data/", name, "_input.Rds", sep = "")) # fullModelOnly = T : test only the full variable set, not model - 1 variable # fullModelOnly = F : estimate model significance and variable importance/ significance using m... # parallel = T # cores : number of cores to be registered for parralel processing # outFile : name of outfile varimp_output <- gdm::gdm.varImp(gdm_model, geo=T, parallel=T, nPerm =100, cores = 10, fullModelOnly = T) saveRDS(varimp_output, file = paste("out/", name, "_perm.Rds", sep = ""))
Rbatch.sh
#!/bin/sh #SBATCH --mail-user=<mail> #SBATCH --mail-type=end,fail #SBATCH --mem-per-cpu=8G #SBATCH --time=12:00:00 #SBATCH --cpus-per-task=10 #SBATCH --job-name="gdm_Groundwater.recharge_LUI" # change #SBATCH --output=gdm_Groundwater.recharge_LUI.out #SBATCH --error=gdm_Groundwater.recharge_LUI.err ############################# execute code here################################ module load vital-it module load R/latest R CMD BATCH --no-save -no-restore gdm.R
Note : there is also the option #SBATCH --workdir=.
which has been taken out, because it did not work. All relative paths of the script are relative to the location from where sbatch
has been runned.
To submit the job to the cluster (run both scripts).
sbatch Rbatch.sh squeue -u <username>
cluster workflow 1. change name in gdm.R to current .Rds file 2. change job name ,
automatically replace names
sed -i 's/gdm_Root.decomposition_LUI/gdm_EFdistance_LUI/g' gdm.R Rbatch.sh
outfiles : - gdm_EFdistance_LUI_permutation.Rds : 4 permutations
value - 1. table : summarizes full model deviance, percent deviance explained by full model p-value of the full model, number of permutations - 2. table : variable importance - 3. variable significance - number of permutations used to calc stats for that model (some GDMs may fail to fit for some permutations/ variable combinations)
Significance is estimated using the bootstrapped p-value when the variable has been permuted.
# show the p-values gdmperm[[3]][,1] # show the number of permutations used gdmperm[[4]][,1]
with plotUncertainty
: get sd of maxspline values
use Rbatch.sh from above, and only modify last line and cpus per task: ```{bash, eval = F}
module load vital-it module load R/latest
R CMD BATCH --no-save -no-restore gdm_uncertainty.R
**gdm_uncertainty.R** ```r library(gdm) name <- "gdm_EFturnover_0.8_LUI" gdm_model <- readRDS(paste("data/", name, "_input.Rds", sep = "")) # load the required function source("plotUncertainty_slim_GDM.R") exists("plotUncertainty_slim") plotuncertainty_output <- plotUncertainty_slim(spTable = gdm_model, sampleSites = 0.3, bsIters = 100, geo = T, cores = 2) # saveRDS(plotuncertainty_output, file = paste("out/", name, "_uncertainty.Rds", sep = "")) # uncomment me saveRDS(plotuncertainty_output, file = paste("vignettes/out/", name, "_uncertainty.Rds", sep = "")) # deleteme
note : 2 cores, 50 iterations took 40 seconds if something does not work : (1) check if you loaded the required modules and (2) check what's written in the *.Rout file! (not the .err and .out files, but the .Rout one with the output from R)
Run permutations by: ```{bash, eval = F} sbatch Rbatch.sh
running for EFturnover (rename the model names in both scripts automatically) ```{bash} sed -i 's/gdm_EFturnover_0.1_LUI/gdm_EFturnover_0.2_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFturnover_0.2_LUI/gdm_EFturnover_0.3_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFturnover_0.3_LUI/gdm_EFturnover_0.4_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFturnover_0.4_LUI/gdm_EFturnover_0.5_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFturnover_0.5_LUI/gdm_EFturnover_0.6_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFturnover_0.6_LUI/gdm_EFturnover_0.7_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFturnover_0.7_LUI/gdm_EFturnover_0.8_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFturnover_0.8_LUI/gdm_EFturnover_0.9_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh
running for EFnestedness
sed -i 's/gdm_EFturnover_0.9_LUI/gdm_EFnestedness_0.1_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFnestedness_0.1_LUI/gdm_EFnestedness_0.2_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFnestedness_0.2_LUI/gdm_EFnestedness_0.3_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFnestedness_0.3_LUI/gdm_EFnestedness_0.4_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFnestedness_0.4_LUI/gdm_EFnestedness_0.5_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFnestedness_0.5_LUI/gdm_EFnestedness_0.6_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFnestedness_0.6_LUI/gdm_EFnestedness_0.7_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFnestedness_0.7_LUI/gdm_EFnestedness_0.8_LUI/g' gdm_uncertainty.R Rbatch.sh sbatch Rbatch.sh sed -i 's/gdm_EFnestedness_0.8_LUI/gdm_EFnestedness_0.9_LUI/g' gdm_uncertainty.R Rbatch.sh
the results are stored at BetaDivMultifun/analysis/output_datasets/uncertainty_calc
Observation : some models fail, with the following warning : "The algorithm was unable to fit a model to your data. The sum of all spline coefficients = 0 and deviance explained = NULL. Returning NULL object." Which can be interpreted as a convergence error of a very poor model. The following two models often failed : - EFturnover 0.8 and EFturnover 0.9, failed directly or took longer than 1h.
The models were run as long as it took them to converge. Not 100% convergence was required, but the number of converged models is reported in the output.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.