Cluster

In order to run R code on the cluster of the University of Bern, please follow the instructions on the UBELIX help page. There is a great documentation (also about using R on the cluster). - get user profile - log in with ssh <username>@submit.unibe.ch - install R and the required packages - run the examples

module load vital-it
module avail 2>&1 | grep " R\/"
# check if your R version is there, e.g. 3.6.1 for me
# module load R/3.6.1
module load R/latest

# install packages
R # open R and within R, type : 
install.packages("gdm")
# accept using personal library
# select CRAN
quit()

# check how far the job is
squeue -u <username>

Data

model results need to be uploaded : _input.Rds files which contain site-pair tables, as used for gdm models.

ssh <username>@submit.unibe.ch

scp /path/to/file <username>@submit.unibe.ch:path/to/target_dir/

scp <username>@submit.unibe.ch:path/to/file /path/to/target_dir/

source : https://docs.id.unibe.ch/ubelix/getting-started

Varimp (p-values)

2 scripts are required : an R script which contains the R code and a SLURM script which contains the cluster code. - R script : gdm.R - cluster script : Rbatch.sh

To run parallel, you need to give the number of cores in (1) the slurm script and (2) the R script. Sys.getenv("SLURM_CPUS_PER_TASK") can be used.

gdm.R

name <- "gdm_Groundwater.recharge_LUI"
gdm_model <- readRDS(paste("data/", name, "_input.Rds", sep = ""))

# fullModelOnly = T : test only the full variable set, not model - 1 variable
# fullModelOnly = F : estimate model significance and variable importance/ significance using m...
# parallel = T
# cores : number of cores to be registered for parralel processing
# outFile : name of outfile

varimp_output <- gdm::gdm.varImp(gdm_model, geo=T, parallel=T, nPerm =100, cores = 10, fullModelOnly = T)

saveRDS(varimp_output, file = paste("out/", name, "_perm.Rds", sep = ""))

Rbatch.sh

#!/bin/sh
#SBATCH --mail-user=<mail>
#SBATCH --mail-type=end,fail

#SBATCH --mem-per-cpu=8G
#SBATCH --time=12:00:00
#SBATCH --cpus-per-task=10
#SBATCH --job-name="gdm_Groundwater.recharge_LUI"

# change
#SBATCH --output=gdm_Groundwater.recharge_LUI.out
#SBATCH --error=gdm_Groundwater.recharge_LUI.err

############################# execute code here################################ 
module load vital-it
module load R/latest

R CMD BATCH --no-save -no-restore gdm.R

Note : there is also the option #SBATCH --workdir=. which has been taken out, because it did not work. All relative paths of the script are relative to the location from where sbatch has been runned.

To submit the job to the cluster (run both scripts).

sbatch Rbatch.sh
squeue -u <username>

workflow

cluster workflow 1. change name in gdm.R to current .Rds file 2. change job name ,

  1. sbatch Rbatch.sh
  2. record job ID in this file

automatically replace names

sed -i 's/gdm_Root.decomposition_LUI/gdm_EFdistance_LUI/g' gdm.R Rbatch.sh

outfiles : - gdm_EFdistance_LUI_permutation.Rds : 4 permutations

value - 1. table : summarizes full model deviance, percent deviance explained by full model p-value of the full model, number of permutations - 2. table : variable importance - 3. variable significance - number of permutations used to calc stats for that model (some GDMs may fail to fit for some permutations/ variable combinations)

Significance is estimated using the bootstrapped p-value when the variable has been permuted.

# show the p-values
gdmperm[[3]][,1]
# show the number of permutations used
gdmperm[[4]][,1]

thesholds : sd of maxspline

with plotUncertainty : get sd of maxspline values

use Rbatch.sh from above, and only modify last line and cpus per task: ```{bash, eval = F}

!/bin/sh

SBATCH --mail-user=noelle.schenk@ips.unibe.ch

SBATCH --mail-type=end,fail

SBATCH --mem-per-cpu=8G

SBATCH --time=1:00:00

SBATCH --cpus-per-task=2

SBATCH --job-name="gdm_EFturnover_0.1_LUI"

change

SBATCH --output=gdm_EFturnover_0.1_LUI.out

SBATCH --error=gdm_EFturnover_0.1_LUI.err

####################### execute code here

module load vital-it module load R/latest

R CMD BATCH --no-save -no-restore gdm_uncertainty.R

**gdm_uncertainty.R**
```r
library(gdm)
name <- "gdm_EFturnover_0.8_LUI"
gdm_model <- readRDS(paste("data/", name, "_input.Rds", sep = ""))
# load the required function
source("plotUncertainty_slim_GDM.R")
exists("plotUncertainty_slim")

plotuncertainty_output <- plotUncertainty_slim(spTable = gdm_model, sampleSites = 0.3, bsIters = 100, geo = T, cores = 2)

# saveRDS(plotuncertainty_output, file = paste("out/", name, "_uncertainty.Rds", sep = "")) # uncomment me
saveRDS(plotuncertainty_output, file = paste("vignettes/out/", name, "_uncertainty.Rds", sep = "")) # deleteme

note : 2 cores, 50 iterations took 40 seconds if something does not work : (1) check if you loaded the required modules and (2) check what's written in the *.Rout file! (not the .err and .out files, but the .Rout one with the output from R)

Run permutations by: ```{bash, eval = F} sbatch Rbatch.sh

running for EFturnover (rename the model names in both scripts automatically)
```{bash}
sed -i 's/gdm_EFturnover_0.1_LUI/gdm_EFturnover_0.2_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFturnover_0.2_LUI/gdm_EFturnover_0.3_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFturnover_0.3_LUI/gdm_EFturnover_0.4_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFturnover_0.4_LUI/gdm_EFturnover_0.5_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFturnover_0.5_LUI/gdm_EFturnover_0.6_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFturnover_0.6_LUI/gdm_EFturnover_0.7_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFturnover_0.7_LUI/gdm_EFturnover_0.8_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFturnover_0.8_LUI/gdm_EFturnover_0.9_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh

running for EFnestedness

sed -i 's/gdm_EFturnover_0.9_LUI/gdm_EFnestedness_0.1_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFnestedness_0.1_LUI/gdm_EFnestedness_0.2_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFnestedness_0.2_LUI/gdm_EFnestedness_0.3_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFnestedness_0.3_LUI/gdm_EFnestedness_0.4_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFnestedness_0.4_LUI/gdm_EFnestedness_0.5_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFnestedness_0.5_LUI/gdm_EFnestedness_0.6_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFnestedness_0.6_LUI/gdm_EFnestedness_0.7_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFnestedness_0.7_LUI/gdm_EFnestedness_0.8_LUI/g' gdm_uncertainty.R Rbatch.sh
sbatch Rbatch.sh
sed -i 's/gdm_EFnestedness_0.8_LUI/gdm_EFnestedness_0.9_LUI/g' gdm_uncertainty.R Rbatch.sh

the results are stored at BetaDivMultifun/analysis/output_datasets/uncertainty_calc

Observation : some models fail, with the following warning : "The algorithm was unable to fit a model to your data. The sum of all spline coefficients = 0 and deviance explained = NULL. Returning NULL object." Which can be interpreted as a convergence error of a very poor model. The following two models often failed : - EFturnover 0.8 and EFturnover 0.9, failed directly or took longer than 1h.

The models were run as long as it took them to converge. Not 100% convergence was required, but the number of converged models is reported in the output.



allanecology/multiFunBetadivLuiGDM documentation built on Nov. 12, 2023, 6:16 a.m.