Aim

Procedure

  1. calculate Threshold averages : with script cluster_scripts_documentation.Rmd in order to produce the input datasets for this script
  2. Quality check of step 1
  3. calculate threshold averages for maxspline (--> average barplots)
  4. calculate threshold averages for whole ispline (--> average lineplots)

Produce input data

Please run the procedure described in cluster_scripts_documentation.Rmd, under plot Uncertainty : get sd of maxspline values. The procedure consists of running the edited function gdm::plotUncertainty_slim on each of the thresholds 0.1-0.9 on the models for EFturnover and EFnestedness. The output datasets are "gdm_EFturnover_0.\<?>_LUI_uncertainty.Rds" (respectively for all thresholds, and for EFnestedness as well.)

# dummy code to test if output can be generated in principle
# test <- plotUncertainty_slim(gdminput, sampleSites = 0.3, bsIters = 5, geo = T, cores = 2)

Requirements

library(corrplot)

Quality check

Aim of this section : assess the quality of the uncertainty calculation which is done in cluster_scripts_documentation.Rmd. Are the datasets complete? Do the values make sense?

Read input data

# Load requirements from <..>nonpublic.R file
sections_to_be_loaded <- c("thresholds")
# sections_to_be_loaded <- c("thresholds", "results", "gdminput", "gdmoutput")
#TODO : model_results.csv wants to be read, but doesn't exist. Put new name
source("vignettes/analysis_nonpublic.R")
theme_set(theme_half_open())

The list ispline_uncertainty_all_thresholds contains the deviance of each ispline separately. Elements of the list are named accordingly.

# Quality check of cluster stuff
par(mfrow = c(1, 2))
# put07 <- ispline_uncertainty_all_thresholds[["gdm_EFturnover_0.7"]]
put01 <- ispline_uncertainty_all_thresholds[["gdm_EFturnover_0.1"]]

# plot to see if makes sense

# PLOT 1
# autotroph sim
plot(put01$autotroph.beta.sim_fullModel_X, put01$autotroph.beta.sim_fullModel_Y, 
     type = "l", ylim = c(-0.1, 1))
lines(put01$autotroph.beta.sim_fullModel_X, put01$autotroph.beta.sim_minusSD_Y)
lines(put01$autotroph.beta.sim_fullModel_X, put01$autotroph.beta.sim_plusSD_Y)
# geodist
plot(put01$Geographic_fullModel_X, put01$Geographic_fullModel_Y, 
     type = "l", ylim = c(-0.1, 1))
lines(put01$Geographic_fullModel_X, put01$Geographic_minusSD_Y)
lines(put01$Geographic_fullModel_X, put01$Geographic_plusSD_Y)


# PLOT 2
# LUI
plot(put01$LUI_fullModel_X, put01$LUI_fullModel_Y, type = "l", 
     ylim = range(put01$LUI_fullModel_Y) + c(-0.02, 0.05))
polygon(x = c(put01$LUI_fullModel_X, rev(put01$LUI_fullModel_X)), 
        y = c(put01$LUI_minusSD_Y, rev(put01$LUI_plusSD_Y)), 
        col = "gray", border = NA)
lines(put01$LUI_fullModel_X, put01$LUI_fullModel_Y)

X values should be the same everywhere

plot(put01$Geographic_fullModel_X, put01$autotroph.beta.sim_fullModel_X)
m <- cor(put01[, grep("fullModel_X", names(put01))], method = "spearman")
corrplot(m, tl.cex = 0.2, type = "lower",
         method = "number", number.cex = 0.3)

Calculate weighted average line models

average effects over all thresholds weighted by ispline sd

from Gossner 2016: - We also calculated a bootstrapped P value for each term in the full GDM, using the gdm.varImp function in the gdm library (Supplementary Table 5-2). Additionally we estimated uncertainty for the GDM plots by using 100 bootstraps for each model, each time removing 30% of the plot pairs and then fitting a GDM and extracting the predictions. We then calculated the s.d. of the predictions and added this (±-) to the fitted line (Extended Data Fig. 6).

runned permutations on the cluster with following parameters : plotUncertainty_getsd(gdm_model, sampleSites = 0.3, bsIters = 100, geo = T, splineCol = "black", cores = 2) results are stored at : planteco/.../BetaDivMultifun/analysis/output_datasets/uncertainty_calc/

Calculate weighted average over all models. The GDM models are not all of the same quality. They are weighted based on the uncertainty of each ispline ($w$).

$$\mu = wa = \frac{e_{1}w_{1} + e_{2}w_{2} + ... + e_{9}w_{9}}{w_{1}+w_{2} + ... + w_{9}} = \frac{\sum_{i=1}^{n} e_{n}\frac{1}{sd_{n}}}{\sum_{i=1}^{n} \frac{1}{sd_{n}}} ; w_{n} = \frac{1}{sd_{n}} $$ where the estimate $e_{n}$ is weighted by its uncertainty $w_{n}$.

The weighted standard deviation is calculated as : $$w_sd = \sqrt{\sum_{i = 1}^{n} w_i (x_i - \mu)}$$

Clean the uncertainty ispline table

For each Predictor : - get X values : are the same in all models --> merge by this value - get Y values : model specific _minusSD_Y, fullModel_Y and _plusSD_Y

# names(ispline_uncertainty_all_thresholds) # names of the list correspond to model names
# To each element of the list (each data frame), add a column with the respective model name
# use workaround : lapply only gives you the elements of the vector you pass it. 
#    The usual work-around is to pass it the names or indices of the vector instead 
#    of the vector itself.
# The list is assigned to an additional argument of the function. i corresponds to the unassigned
#    function argument, which is the sequ_along().
ispline_uncertainty_all_thresholds <- lapply(seq_along(ispline_uncertainty_all_thresholds), 
                function(y, n, i) {y[[i]]$model_name <- n[i]; y[[i]]$value_ID <- seq(1, 8350); return(y[[i]])},
       y = ispline_uncertainty_all_thresholds,
       n = names(ispline_uncertainty_all_thresholds))
# pile all datasets on top of each other
ispline_uncertainty_all_thresholds <- rbindlist(ispline_uncertainty_all_thresholds, use.names = T)
# filter out the relevant columns : exclude x uncertainties
relevant_column_names <- unique(c("model_name", "value_ID",
                           grep("_fullModel_X|_minusSD_Y|fullModel_Y|_plusSD_Y|SDn",
                                names(ispline_uncertainty_all_thresholds), value = T)
                           ))
ispline_uncertainty_all_thresholds <- ispline_uncertainty_all_thresholds[, ..relevant_column_names]
rm(relevant_column_names)

# filter relevant x values
# 8350 are just too many x values. Is not more correct if I have more values.
# take around 334 values, plus the max value.
relevant_rowindices <- c(seq(1, 8350, by = 25), 8350)
ispline_uncertainty_all_thresholds <- ispline_uncertainty_all_thresholds[value_ID %in% relevant_rowindices, ]
#TODO rename the rownames to 1 : 335 again --> easier to handle!

# # #
# quality control
# plot individual predictor of one model
ggplot(ispline_uncertainty_all_thresholds[model_name == "gdm_EFturnover_0.2",
                                          .(autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y)], 
       aes(x = autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y)) +
  geom_line()
ggplot(ispline_uncertainty_all_thresholds[model_name == "gdm_EFnestedness_0.5", .(autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y)], aes(x = autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y)) +
  geom_line()

# plot individual predictor of all models
ggplot(ispline_uncertainty_all_thresholds, aes(x = autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y, col = model_name)) +
  geom_line()

ggplot(ispline_uncertainty_all_thresholds, 
       aes(x = autotroph.beta.sim_fullModel_X, autotroph.beta.sim_fullModel_Y, 
           col = model_name)) +
  geom_line()

# save output
# saveRDS(ispline_uncertainty_all_thresholds, file = paste0(pathtoout, "/allmod_LUI_uncertainty.Rds"))

Calculate weighted average

Calculate $w$ for each value of X for each predictor for each model individually. In other words, calculate how wide the confidence interval is at a given place in X for a given predictor and model. Speciality : All models of the same predictor have the same X values, which are identified with value_ID.

Calculate $w$ : $\text{minusSD_Y} = \mu - \sigma$ and $\text{plusSD_Y} = \mu + \sigma$ . Therefore : $\mu + \sigma - (\mu - \sigma) = \mu + \sigma - \mu + \sigma = 2\sigma$ And thus : $w = \sigma = \frac{\text{plusSD_Y} - \text{minusSD_Y}}{2}$

# ispline_uncertainty_all_thresholds <- readRDS(paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty.Rds"))

ispline_uncertainty_all_thresholds <- melt(ispline_uncertainty_all_thresholds, id.vars = c("model_name", "value_ID"))

# create helper dataset of N
ispline_iterations <- unique(ispline_uncertainty_all_thresholds[grep("SDn", variable), .(model_name, variable, value)])
ispline_iterations[, predictor := sub("_SDn", "", variable)]
setnames(ispline_iterations, old = "value", new = "SDn")
ispline_iterations[, variable := NULL]

# clean ispline_uncertainty table
ispline_uncertainty_all_thresholds[, predictor := sub("_[a-zA-Z]+_[XY]$", "", variable, perl = T)]
ispline_uncertainty_all_thresholds[, predictor := sub("_SDn", "", predictor)]
ispline_uncertainty_all_thresholds[, Variable_type := sub("soil_|isolation_", "", sub("^[a-zA-Z.]+_", "", variable, perl = T), perl = T)] 
ispline_uncertainty_all_thresholds[, variable := NULL]
ispline_uncertainty_all_thresholds[grep("X", ispline_uncertainty_all_thresholds$Variable_type), XY := "X"]
ispline_uncertainty_all_thresholds[grep("Y", ispline_uncertainty_all_thresholds$Variable_type), XY := "Y"]
ispline_uncertainty_all_thresholds[, Model_type := "nestedness"]
ispline_uncertainty_all_thresholds[grep("turnover", model_name), Model_type := "turnover"]

# 1. create dataset with X values which will be used to get the rest
Xtest <- ispline_uncertainty_all_thresholds[XY == "X", ]
setnames(Xtest, old = "value", new = "Xvalue")
Xtest <- Xtest[, .(model_name, Model_type, predictor, Xvalue, value_ID)]

# 2. merge the Y values to this one
ispline_uncertainty_all_thresholds <- merge(Xtest, ispline_uncertainty_all_thresholds[XY == "Y", .(model_name, Model_type, predictor, value_ID, value, Variable_type)],
      by = c("model_name", "predictor", "value_ID", "Model_type"))
rm(Xtest)

ispline_uncertainty_all_thresholds <- dcast(ispline_uncertainty_all_thresholds, model_name + Model_type + predictor + value_ID + Xvalue ~ Variable_type)

# # # # #
# Calculate the weight SD_Y
#
# # # #
# (1) CALCULATE UNCERTAINTY (SD)
ispline_uncertainty_all_thresholds[, sd := (plusSD_Y - minusSD_Y)/2]
range(ispline_uncertainty_all_thresholds$sd)
par(mfrow = c(1,2))
hist(ispline_uncertainty_all_thresholds$sd, xlim = c(0, 0.95), main = "distribution of sd\n along all predictors and values")
#
# # CONSIDERATIONS
# # hist(ispline_uncertainty_all_thresholds$sd, xlim = c(0, 0.95), ylim = c(0, 4000))
# hist(ispline_uncertainty_all_thresholds$sd, xlim = c(0, 0.95), ylim = c(0, 300), main = "")
# # save as "GDM_multifun_thresholds_hist_all_sd.pdf"
# dev.off()
# # plotting only at value max
# par(mfrow = c(1,2))
# hist(ispline_uncertainty_all_thresholds[value_ID == 8350, sd], xlim = c(0, 0.95), ylim = c(0, 300),
#      main = "distribution of sd\n along all predictors\nat maximum")
# hist(ispline_uncertainty_all_thresholds[value_ID == 8350, sd], xlim = c(0, 0.95), ylim = c(0, 60), main = "")
# # save as "GDM_multifun_thresholds_hist_all_sd_at_maximum.pdf"
# dev.off()
# # plotting only at min value
# par(mfrow = c(2,2))
# hist(ispline_uncertainty_all_thresholds[value_ID == 1, sd], ylim = c(0, 100),
#      main = "distribution of sd along all predictors\nat min, of total 335")
# hist(ispline_uncertainty_all_thresholds[value_ID == 76, sd], ylim = c(0, 100),
#      main = "distribution of sd along all predictors\nat 4th Xvalue, of total 335")
# hist(ispline_uncertainty_all_thresholds[value_ID == 2476, sd], ylim = c(0, 300),
#      main = "distribution of sd along all predictors\nat 100th Xvalue, of total 335")
# hist(ispline_uncertainty_all_thresholds[value_ID == 4976, sd], ylim = c(0, 300),
#      main = "distribution of sd along all predictors\nat 200th Xvalue, of total 335")
# # save as "GDM_multifun_thresholds_hist_all_sd_at_selected_xvalues.pdf"
# dev.off()
# 
# # at selected predictors, at 200th value
# par(mfrow = c(2,2))
# hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor ==  "LUI" & Model_type == "turnover", sd],
#      main = "distribution of sd LUI turnover\nat 200th Xvalue, of total 335")
# hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor ==  "autotroph.beta.sim" & Model_type == "turnover", sd],
#      main = "distribution of sd autotroph.beta.sim turnover\nat 200th Xvalue, of total 335")
# hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor ==  "omnivore.protist.beta.sne" & Model_type == "turnover", sd],
#      main = "distribution of sd omnivore.protist.beta.sne turnover\nat 200th Xvalue, of total 335")
# hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor ==  "Geographic" & Model_type == "turnover", sd],
#      main = "distribution of sd Geographic turnover\nat 200th Xvalue, of total 335")
# # save as "GDM_multifun_thresholds_hist_all_sd_at_selected_xvalues_and_predictors.pdf"
# dev.off()
# 
# 
# # are the 0 sd cases all cases at yvalue = 0 and xvalue = 0?
# hist(unique(ispline_uncertainty_all_thresholds[sd == 0, fullModel_Y]))
# max(unique(ispline_uncertainty_all_thresholds[sd == 0, fullModel_Y]))
# # no, but all at very low values < 0.0051
# 
# # do small sd occurr at small Y predictions?
# ggplot(ispline_uncertainty_all_thresholds, aes(x = sd, y = fullModel_Y, color = interaction(Model_type, predictor))) +
#   geom_point() +
#   theme(legend.position = "none")
# # one line is one model.
# #    within a model : correlation of around 1 for sd increases with yvalue
# # save as "GDM_multifun_thresholds_hist_sd_increase_with_effectsize.pdf"


# # # #
# (2) HANDLING 0 CASES
#   the standard deviation is 0 in some cases: 
#       only occurs at very low predictions < 0.0051. Most are 0, but not all.
#   we could replace the 0 and very small values with the second smallest value
hist(ispline_uncertainty_all_thresholds$sd, xlim = c(0, 0.000001), ylim = c(0, 4000))
min(ispline_uncertainty_all_thresholds$sd[!(ispline_uncertainty_all_thresholds$sd == 0)]) 
# is way larger than the machine epsilon (e-12 vs e-16).
# we could also use a 3-step approach : 
#     a. normalise standard deviations to values between 0 and 1
#     b. calculate the weights 1 / sd
#        and replace the Inf values with 1 (maximum weight)
#     c. normalise again
# but also here, at one point we have to replace 0 or Inf values with 
#   a number. Therefore, we go for the approach where 0 are replaced by epsilon.
# ispline_uncertainty_all_thresholds[weight == 0, weight := .Machine$double.eps]
ispline_uncertainty_all_thresholds[sd == 0, sd := min(ispline_uncertainty_all_thresholds$sd[!(ispline_uncertainty_all_thresholds$sd == 0)])]
#

# # # #
# (3) CALCULATE WEIGHT
ispline_uncertainty_all_thresholds[, weight := 1 / sd]
#
# # visualise
# hist(ispline_uncertainty_all_thresholds$weight)
# hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor ==  "LUI" & Model_type == "turnover", weight],
#      main = "distribution of unstandardised weight LUI turnover\nat 200th Xvalue, of total 335")
# hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor ==  "autotroph.beta.sim" & Model_type == "turnover", weight],
#      main = "distribution of unstandardised weight autotroph.beta.sim turnover\nat 200th Xvalue, of total 335")
range(ispline_uncertainty_all_thresholds$weight)

# # # #
# (4) NORMALISE WEIGHTS
#
#  the normalisation needs to be done : 
#    - per value_ID, because small values have smaller standard deviations. the models should get weights accordingly
#    - per predictor : because ranges differ across predictors
#    - per model type (turnover vs. nestedness) : because ranges can differ
# normalise weights between 0 and 1
#   per predictor
#   per valueID
#   per model type (turnover vs. nestedness)
#     each normalisation is a normalisation of 9 values
nrow(ispline_uncertainty_all_thresholds) == 18 * 35 * 335

# do the normalisation NOT over all, but grouped by
# note : no shifting to zero needed, because values are already shifted to 0.
# found this nice data.table code, need to test if does what I want.
setDT(ispline_uncertainty_all_thresholds)[, weight_norm := (weight - min(weight, na.rm = T)) / (max(weight, na.rm = T) - min(weight, na.rm = T)), 
          by = list(predictor, value_ID, Model_type)]
range(ispline_uncertainty_all_thresholds$weight_norm, na.rm = T) # between 0 and 1
# some caes lead to NaN : sd is everywhere the same. happens ONLY in fullmodelY = 0 cases.
# deleteme <- ispline_uncertainty_all_thresholds[is.nan(weight_norm), ][order(predictor, value_ID, Model_type), .(predictor, value_ID, Model_type, fullModel_Y, sd, weight_norm, weight)]
# nrow(ispline_uncertainty_all_thresholds[is.nan(weight_norm) & fullModel_Y != 0, ][order(predictor, value_ID, Model_type), .(predictor, value_ID, Model_type, fullModel_Y, sd, weight_norm, weight)])
#
# replace NaN values with 1
ispline_uncertainty_all_thresholds[is.nan(weight_norm), weight_norm := 1]
#
# # CONSIDERATIONS
# # is the range within one predictor within one valueID within one model type ok?
# hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor ==  "LUI" & Model_type == "turnover", weight_norm],
#      main = "distribution of normalised weight LUI turnover\nat 200th Xvalue, of total 335")
# hist(ispline_uncertainty_all_thresholds[value_ID == 2476 & predictor ==  "autotroph.beta.sne" & Model_type == "turnover", weight_norm],
#      main = "distribution of normalised weight autotroph.beta.sne turnover\nat 100th Xvalue, of total 335")



# # # #
# (5) CALCULATE WEIGHTED MEAN OF FULLMODEL Y
#
# calculate the weighted mean of fullModel_Y
ispline_uncertainty_all_thresholds[, weighted_Y := fullModel_Y * weight_norm]
setDT(ispline_uncertainty_all_thresholds)[, sum_of_weight := sum(weight_norm), by = list(predictor, value_ID, Model_type)]
setDT(ispline_uncertainty_all_thresholds)[, sum_of_weighted_Y := sum(weighted_Y), by = list(predictor, value_ID, Model_type)]
ispline_uncertainty_all_thresholds[, mean_Y := sum_of_weighted_Y/ sum_of_weight]
ispline_uncertainty_all_thresholds[, c("sum_of_weight", "sum_of_weighted_Y", "weighted_Y") := NULL]
#

# # # #
# (6) CALCULATE WEIGHTED SD OF FULLMODEL Y
#
ispline_uncertainty_all_thresholds[, weighted_xi_minus_mu := weight_norm * ((fullModel_Y - mean_Y)^2)]
# sum and  square root
setDT(ispline_uncertainty_all_thresholds)[, weighted_SD_mean_Y := sqrt(sum(weighted_xi_minus_mu)), by = list(predictor, value_ID, Model_type)]
# weighted SE
#    n is 9, because 9 models
#    every average is calculated by 9 weighted values
ispline_uncertainty_all_thresholds[, weighted_SE_mean_Y := weighted_SD_mean_Y / sqrt(9)]
ispline_uncertainty_all_thresholds[, weighted_xi_minus_mu := NULL]


# # # #
# (5) CALCULATE STANDARD ERROR
#
# NOTE : not everywhere 100 iterations were possible. Add SDn which gives the number of
#  iterations in a given model
# Calculate SE = sd / sqrt(N)
table(ispline_iterations$SDn)
ispline_uncertainty_all_thresholds <- merge(ispline_uncertainty_all_thresholds, ispline_iterations, 
      by = c("predictor", "model_name"), all = T)
ispline_uncertainty_all_thresholds[, se := sd / sqrt(SDn)]

# save output dataset
# saveRDS(ispline_uncertainty_all_thresholds, file = paste0(pathtoout, "/allmod_LUI_uncertainty_weighted_avg.Rds"))
# fwrite(ispline_uncertainty_all_thresholds, file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg.csv"))

Smoothing

#TODO smooth lines along line plot. Some have "ecken". Do same for sd!

Plotting

# ispline_uncertainty_all_thresholds <- readRDS(paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg.Rds"))
# ispline_uncertainty_all_thresholds <- fread(paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg.csv"))

# Quality checks
# with standard deviation
ggplot(ispline_uncertainty_all_thresholds[predictor == "autotroph.beta.sne" & Model_type == "nestedness"], 
       aes(x = Xvalue, y = fullModel_Y, col = model_name, fill = model_name)) +
  geom_line() +
  geom_ribbon(aes(ymin = minusSD_Y, ymax = plusSD_Y), alpha = 0.1, linetype = 0) +
  scale_fill_brewer(palette="Set1") +
  scale_color_brewer(palette="Set1") +
  coord_cartesian(ylim = c(-0.02, 0.1)) + xlab("") + ylab("") +
  geom_line(aes(x = Xvalue, y = mean_Y), linewidth = 1.5, color = "red")

# with standard error
ggplot(ispline_uncertainty_all_thresholds[predictor == "autotroph.beta.sne" & Model_type == "nestedness"], 
       aes(x = Xvalue, y = fullModel_Y, col = model_name, fill = model_name)) +
  geom_line() +
  geom_ribbon(aes(ymin = fullModel_Y - se, ymax = fullModel_Y + se), alpha = 0.1, linetype = 0) +
  scale_fill_brewer(palette="Set1") +
  scale_color_brewer(palette="Set1") +
  coord_cartesian(ylim = c(-0.02, 0.1)) + xlab("") + ylab("") +
  geom_line(aes(x = Xvalue, y = mean_Y), linewidth = 1.5, color = "red")


ggplot(ispline_uncertainty_all_thresholds[predictor == "LUI" & Model_type == "nestedness"], 
       aes(x = Xvalue, y = fullModel_Y, col = model_name, fill = model_name)) +
  geom_line() +
  scale_fill_brewer(palette="Set1") +
  scale_color_brewer(palette="Set1") +
  coord_cartesian(ylim = c(-0.02, 0.1)) + xlab("") + ylab("") +
  geom_line(aes(x = Xvalue, y = mean_Y), linewidth = 1.5, color = "red")
#
ggplot(ispline_uncertainty_all_thresholds[predictor == "plot_isolation" & Model_type == "nestedness",], 
       aes(x = Xvalue, y = fullModel_Y, col = model_name, fill = model_name)) +
  geom_line() +
  scale_fill_brewer(palette="Set1") +
  scale_color_brewer(palette="Set1") +
  coord_cartesian(ylim = c(-0.02, 0.1)) +
  geom_line(aes(x = Xvalue, y = mean_Y), linewidth = 1.5, color = "red") +
  xlab("") + ylab("")
#

Structured plotting

create plot data

Match with plotting table nicenames

setnames(ispline_uncertainty_all_thresholds, old = "predictor", new = "names")
# reshape table : bring mean to it's own row
test <- melt(ispline_uncertainty_all_thresholds,
     measure.vars = c("fullModel_Y", "mean_Y", "weighted_SD_mean_Y", "weighted_SE_mean_Y"),
     id.vars = c("model_name", "Model_type", "value_ID", "names", "Xvalue")
    )
test[variable == "mean_Y", model_name := paste0("weighted_mean_", Model_type)]
test[variable == "weighted_SD_mean_Y", model_name := paste0("weighted_mean_", Model_type)]
test[variable == "weighted_SE_mean_Y", model_name := paste0("weighted_mean_", Model_type)]
test <- unique(test[variable != "fullModel_Y", ])
test <- dcast(test, model_name + Model_type + value_ID + names + Xvalue ~ variable,
      value.var = "value")
# add the source models again (fullModel_Y)
setnames(test, old = c("mean_Y", "weighted_SD_mean_Y", "weighted_SE_mean_Y"), new = c("value", "sd", "se"))
setnames(ispline_uncertainty_all_thresholds, old = "fullModel_Y", new = "value")
test <- rbindlist(list(test,
          ispline_uncertainty_all_thresholds[, .(model_name, Model_type, value_ID, names, Xvalue, value, sd, se)]),
          use.names = T)
nrow(test)  == nrow(ispline_uncertainty_all_thresholds) + 335 * 35 * 2
# has 335 * 35 * 2 rows more. Should contain 234500 rows.

ispline_uncertainty_all_thresholds <- merge(test, nicenames, by = c("names"))

setnames(ispline_uncertainty_all_thresholds, 
         old = c("Xvalue"),
         new = c("xaxis"))
# add linewidth for mean model
ispline_uncertainty_all_thresholds[, lwd := 1]
ispline_uncertainty_all_thresholds[grep("weighted_mean", ispline_uncertainty_all_thresholds$model_name), lwd := 2]

# save output dataset
# saveRDS(ispline_uncertainty_all_thresholds, file = paste0(pathtoout, "/allmod_LUI_uncertainty_weighted_avg_plotdata.Rds"))
# fwrite(ispline_uncertainty_all_thresholds, file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv"))

lineplot with standard error

Plot each predictor individually, showing all model lines with uncertainty (standard error)

pdf("GDM_weighted_avg_over_thresholds_allmods.pdf", onefile = T, paper = "a4r")
for(mtype in c("nestedness", "turnover")){
  for(pred in unique(ispline_uncertainty_all_thresholds$names)){
    p <- create_gdm_lineplot_predictorwise_thresholds(
      data = ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, ],
      ribbon = T,
      ribbontype = "se",
      legend = T,
      plottitle = paste(pred, mtype),
      ymax = min(1, max(ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, value + se], na.rm = T), na.rm = T)
      )
    plot(p)
  }
}
dev.off()


pdf("GDM_weighted_avg_over_thresholds_allmods_printable.pdf", onefile = T, paper = "a4r")
for(mtype in c("nestedness", "turnover")){
  for(pred in unique(ispline_uncertainty_all_thresholds$names)){
    p <- create_gdm_lineplot_predictorwise_thresholds(
      data = ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, ],
      ribbon = T,
      ribbontype = "se",
      legend = F,
      plottitle = paste(pred, mtype),
      ymax = min(1, max(ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, value + se], na.rm = T), na.rm = T)
      )
    plot(p)
  }
}

# add legend
p <- create_gdm_lineplot_predictorwise_thresholds(
      data = ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, ],
      ribbon = T,
      ribbontype = "se",
      legend = T,
      plottitle = paste(pred, mtype),
      ymax = min(1, max(ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, value + se], na.rm = T), na.rm = T)
      )
ggdraw(get_legend(p))

dev.off()

line plots

Nestedness and Turnover : manually edit

lineplot_data <- ispline_uncertainty_all_thresholds[model_name == "weighted_mean_nestedness"]

p <- create_gdm_lineplot(lineplot_data[ground == "a"], legend = F, ymax = max(lineplot_data$value),
                    ribbon = F)
q <- create_gdm_lineplot(lineplot_data[ground == "b"], legend = F, ymax = max(lineplot_data$value),
                    ribbon = F)

# abiotic subplots
# LUI
luidat <- lineplot_data[names %in% "LUI",]
deltaluidat <- lineplot_data[names %in% "deltaLUI",]
l <- create_gdm_lineplot(luidat, legend = F, ymax = max(lineplot_data$value),
                         ribbon = F)
dl <- create_gdm_lineplot(deltaluidat, legend = F, ymax = max(lineplot_data$value),
                         ribbon = F)

lgeo <- create_gdm_lineplot(lineplot_data[names %in% c("Geographic")], legend = F, 
                            ymax = max(lineplot_data$value), ribbon = F)
lsoil <- create_gdm_lineplot(lineplot_data[names %in% c("edis_soil")], legend = F, 
                             ymax = max(lineplot_data$value), ribbon = F)
lisol <- create_gdm_lineplot(lineplot_data[names %in% c("plot_isolation")], legend = F, 
                             ymax = max(lineplot_data$value), ribbon = F)

# create legends
biotic_legend <- get_nice_legend(type = "biotic")
biotic_legend <- plot_grid(NULL, NULL, NULL, NULL, ggdraw(biotic_legend), NULL, rel_widths = c(0.1, 0.8, 0.1), rel_heights = c(0.01, 0.99))
abiotic_legend <- get_nice_legend(type = "abiotic")
abiotic_legend <- plot_grid(NULL, NULL, NULL, NULL, ggdraw(abiotic_legend), NULL, rel_widths = c(0.1, 0.8, 0.1), rel_heights = c(0.01, 0.99))

# combine plots
# create classic lineplot
lin <- plot_grid(
  plot_grid(p, q, l,  labels = c('A', 'B', 'C'), nrow = 3),
  plot_grid(biotic_legend, abiotic_legend, nrow = 2, rel_heights = c(0.7, 0.3))
  )
lin_abiotic <- plot_grid(dl, lgeo, lsoil, lisol,
                         labels = c("D", "", "", ""),
                         align = T, nrow = 1)

all_lines <- plot_grid(lin, lin_abiotic, nrow = 2, rel_heights = c(0.8, 0.2))

# save as A4 pdf: 
#    "GDM_multifun_thresholds_lineplot_allpredictor_nestedness.pdf"
#    "GDM_multifun_thresholds_lineplot_allpredictor_turnover.pdf"

Models max height

Many comparisons are done at max height --> better overview.

Calculate at max height and overviewbars

Dataset can be (1) reduced to max values, and (2) overviewbars are calculated.

(1) reduce to max values Create dataset restab

ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv"))

# reduce to max values
restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ]
restab[, value_ID := NULL]
restab[, xaxis := NULL]

# keep average models
restab <- restab[model_name %in% c("weighted_mean_turnover", "weighted_mean_nestedness"), ]
# 70 rows left, because 35 predictors in 2 models

# reformat table :
# model results of turnover and nestedness average models should be found in separate columns
restab <- data.table::dcast(
  data = restab,
  formula = paste(paste(names(nicenames), collapse = " + "),"~ model_name"),
  value.var = c("value", "se"))

# rename value column to model name
names(restab)[grep("value", names(restab))] <- sub("value_", "", grep("value", names(restab), value = T))

(2) Calculate overviewbars Dataset p_ov_allthres is produced

a <- create_single_funs_overviewbars(restab2 = restab, 
                                     rel_colnames = c("weighted_mean_turnover","weighted_mean_nestedness"))

# bring to ggplot format
ov_ab_allthresholds <- melt(a$above_below, id.vars = c("ground", "color"))
ov_tn_allthresholds <- melt(a$turnover_nestedess, id.vars = c("component", "color"))
# combine to shared datset
ov_ab_allthresholds[, overviewbar_type := "sum_above_below"]
ov_tn_allthresholds[, overviewbar_type := "sum_turn_nes"]
ov_allthresholds <- rbindlist(list(ov_ab_allthresholds, ov_tn_allthresholds), use.names = T, fill = T)
rm(ov_ab_allthresholds); rm(ov_tn_allthresholds)
# add id variable
ov_allthresholds[, mod_ov_id := paste(variable, overviewbar_type, sep = "_")]

# create plots
p_ov_allthres <- create_single_funs_overviewbar_plot(singleF_restab = ov_allthresholds, 
  pos = "stack", xvar_name = "mod_ov_id")
# (earlier) saved as : <date>_GDM_multifun_thresholds_allthresholds_overview_bars<_dodge>
#TODO potentially delete because not used?

bar plots

classic barplots
# produce barplots
for(i in c("weighted_mean_nestedness", "weighted_mean_turnover")){
  p <- create_bio_aboveground_barplot(type = "stacked", # type in c("stacked", "grouped")
                                    yvar_name = i,
                                    plottitle = "gdm_EFnestedness_weighted_mean",
                                    errorbar = F) 
  b <- create_bio_belowground_barplot(type = "stacked", yvar_name = i)
  q <- create_abio_barplot(type = "stacked", yvar_name = i)

  # produce overviewbars
  ov <- create_single_funs_overviewbar_plot(ov_allthresholds[variable == i, ], 
                                            pos = "stack", xvar_name = "overviewbar_type") +
    coord_flip()
  ovL <- create_single_funs_overviewbar_plot(ov_allthresholds[variable == i, ], 
                                             pos = "stack", xvar_name = "overviewbar_type", legend = T)


  multipanel_with_overview_bars <- plot_grid(p, b, q, ov, 
                                             labels = c('A', '', 'B', 'C'), 
                                             label_size = 12, nrow = 4,
                                             rel_heights = c(0.22, 0.32, 0.17, 0.13), 
                                             align = "v") # 2 summary bars
  # save
  ggsave(filename = paste(pathtoout, "/GDM_multifun_thresholds_", i, "_barplot.pdf", sep = ""), 
         plot = multipanel_with_overview_bars, 
         device=cairo_pdf, width = 11.96, height = 8.27, units = "in", dpi = 900)
}
barplots with error bars

show standard errors

for(i in c("weighted_mean_nestedness", "weighted_mean_turnover")){
  p <- create_bio_aboveground_barplot(type = "grouped", # type in c("stacked", "grouped")
                                yvar_name = i,
                                plottitle = "gdm_EFnestedness_weighted_mean",
                                errorbar = T,
                                pylim = c(0, 0.44))
  b <- create_bio_belowground_barplot(type = "grouped", yvar_name = i,
                                errorbar = T,
                                pylim = c(0, 0.44))
  q <- create_abio_barplot(type = "grouped", yvar_name = i, errorbar = T,
                                pylim = c(0, 0.44))

  # produce overviewbars
  ov <- create_single_funs_overviewbar_plot(ov_allthresholds[variable == i, ], 
                                            pos = "stack", xvar_name = "overviewbar_type") +
    coord_flip()
  ovL <- create_single_funs_overviewbar_plot(ov_allthresholds[variable == i, ], 
                                             pos = "stack", xvar_name = "overviewbar_type", legend = T)


  multipanel_with_overview_bars <- plot_grid(p, b, q, ov, 
                                             labels = c('A', '', 'B', 'C'), 
                                             label_size = 12, nrow = 4,
                                             rel_heights = c(0.24, 0.34, 0.11, 0.105), 
                                             align = "v") # 2 summary bars
  # save
  ggsave(filename = paste(pathtoout, "/GDM_multifun_thresholds_", i, "_barplot_se.pdf", sep = ""), 
         plot = multipanel_with_overview_bars, 
         device=cairo_pdf, width = 11.96, height = 8.27, units = "in", dpi = 900)
}

Heatmaps

ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv"))

# reduce to max values
restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ]
restab[, value_ID := NULL]
restab[, xaxis := NULL]

# reformat table :
# model results in separate columns
restab <- data.table::dcast(
  data = restab,
  formula = paste(paste(names(nicenames), collapse = " + "),"~ model_name"),
  value.var = c("value"))
all models

All models including average

include <- c("nicenames", names(restab)[grep("turnover|nestedness", names(restab), perl = T)])
pal <- RColorBrewer::brewer.pal(9, "YlOrRd")

df <- restab[ground == "a", ]
df <- data.table::melt(df[, ..include], id.vars = c("nicenames"))
df[value == 0, value := NA] # 0 values are set to NA to make them appear grey in the plot

a <- ggplot(data = df, aes(x = variable, y = nicenames, fill = value)) +
  geom_tile() +
  scale_fill_gradientn(colours = pal, na.value = "grey80", 
                       limits = c(0, 0.35),
                       labels = c(0, 0.1, 0.2, 0.3, 0.35)) +
  theme(axis.text.x = element_text(angle = 90, vjust = 1, 
    size = 9, hjust = 1),
    plot.margin = margin(l = 50, r = 20),
    axis.line.x=element_blank(),
    axis.text.y = element_text(size=9, angle = 0),
    axis.ticks.x = element_blank(),
    panel.grid.major.x = element_line(color = "grey"),
    axis.title = element_blank(),
    legend.position = c(-0.11, 1.2)) +
  scale_x_discrete(position = "top") +
  labs(fill = "") +
  guides(fill = guide_colourbar(ticks.colour = "black", frame.colour = "black"))
# aL <- get_legend(a)
# a <- a + theme(legend.position = "none")

df <- restab[ground == "b", ]
df <- data.table::melt(df[, ..include], id.vars = c("nicenames"))
df[value == 0, value := NA]
b <- ggplot(data = df, aes(x = variable, y = nicenames, fill = value)) +
  geom_tile() +
  scale_fill_gradientn(colours = pal, na.value = "grey80") +
  theme(legend.position = "none", 
        axis.title = element_blank(),
        axis.text.y = element_text(size=9, angle = 0),
        plot.margin = margin(l = 50, r = 20),
        axis.text.x = element_blank(),
        axis.line.x=element_blank(),
        axis.ticks.x = element_blank(),
        panel.grid.major.x = element_line(color = "grey"))


df <- restab[ground == "x", ]
df <- data.table::melt(df[, ..include], id.vars = c("nicenames"))
df[value == 0, value := NA]
c <- ggplot(data = df, aes(x = variable, y = nicenames, fill = value)) +
  geom_tile() +
  scale_fill_gradientn(colours = pal, na.value = "grey80") +
  theme(legend.position = "none", 
        axis.title = element_blank(),
        axis.text.y = element_text(size=9, angle = 0),
        plot.margin = margin(l = 50, r = 20),
        axis.text.x = element_blank(),
        axis.line.x=element_blank(),
        axis.ticks.x = element_blank(),
        panel.grid.major.x = element_line(color = "grey"))


heatmap <- plot_grid(a, b, c, labels = c('A', '', 'B'), label_size = 12, 
                     nrow = 3, rel_heights = c(0.52, 0.39, 0.09), align = "v")

ggsave(filename = paste(pathtoout, "/plot_GDM_thresholds_heatmap.pdf", sep = ""), plot = heatmap,
       device = cairo_pdf,
       width = 11.96, height = 8.27, units = "in", dpi = 900)

clustered heatmap without average models

include <- include[-which(include %in% c("weighted_mean_nestedness", "weighted_mean_turnover"))]
restab1 <- restab[, ..include]
restab1 <- as.matrix(restab1[, -1])
rownames(restab1) <- restab$nicenames
colnames(restab1) <- gsub("EFnestedness_", "NES ", gsub("EFturnover_", "TO ", 
                                                        gsub("gdm_", "", gsub("_LUI", "", colnames(restab1)))))

heatmap(restab1, hclustfun = hclust)
# manually saved as "plot_GDM_thresholds_heatmap_cluster.pdf"

Create synthesis plot

Create a plot with average effect sizes across the 2 main models : EFturnover_WEIGHTED_mean, EFnestedness_WEIGHTED_mean.

Biotic predictors

DEPREC Avg across 2 main

Average across 2 main models. (remember the 2 main models are weighted averages of all models)

ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv"))
# reduce to max values
restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ]
restab[, value_ID := NULL]
restab[, xaxis := NULL]
# keep average models
restab <- restab[model_name %in% c("weighted_mean_turnover", "weighted_mean_nestedness"), ]
# 70 rows left, because 35 predictors in 2 models

# filter biotic effects
# add together turnover and nestedness effects
# sum turnover and nestedness of each model
restab <- aggregate(restab[type == "bio", ], value ~ model_name + legendnames, FUN = sum)

# quality check
plot(as.factor(restab$model_name), restab$value) # are in same range

# average
restab <- data.table(aggregate(restab, value ~ legendnames, FUN = mean))

# sort and plot
restab <- merge(restab, unique(nicenames[, .(legendnames, color)]), by = "legendnames", all.x = T)
restab[legendnames == "secondary consumer arthropodsoillarvae", legendnames := "secondary consumer\narthropodsoillarvae"]
setorder(restab, cols = "value")
restab[, legendnames := factor(legendnames, levels = restab$legendnames)]
p <- ggplot(data = restab, aes(x = legendnames, y = value, fill = color)) +
  geom_bar(stat = "identity", color = "black") + 
  coord_flip() +
  scale_fill_identity() +
  ylab("averages across main models") + xlab("") +
  labs(title = "Average effect size \nacross main models",
              caption = "Average across 2 main models EFturn and EFnes.\n 
       Both are weighted averages of effect sizes across the 9 thresholds.")
p
# saved as : "GDM_multifun_thresholds_synthesis_biotic_ranks.pdf" A5 landscape

Weighted averages of two main models.

ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv"))
# reduce to max values
restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ]
restab[, value_ID := NULL]
restab[, xaxis := NULL]
# keep average models
restab <- restab[model_name %in% c("weighted_mean_turnover", "weighted_mean_nestedness"), ]
# 70 rows left, because 35 predictors in 2 models

# filter biotic effects
# add together turnover and nestedness effects
# sum turnover and nestedness of each model
restab1 <- aggregate(restab[type == "bio", ], value ~ model_name + legendnames, FUN = sum)
restab2 <- aggregate(restab[type == "bio", ], sd ~ model_name + legendnames, FUN = function(x) sqrt(sum(x^2)))
restab <- data.table(merge(restab1, restab2, by = c("model_name", "legendnames")))
rm(restab1, restab2)

# calculate weighted means
restab[, weighted_means := value * (1 / sd)]
restab[, weight := 1 / sd]
restab1 <- data.table(aggregate(restab, weighted_means ~ legendnames, FUN = mean))
restab2 <- data.table(aggregate(restab, weight ~ legendnames, FUN = sum))
restab1 <- merge(restab1, restab2, by = "legendnames")
restab1[, avg_across_models := weighted_means / weight] # divide mean of weighted effects by sum of weights
restab <- restab1[, .(legendnames, avg_across_models)]
rm(restab1, restab2)

# sort and plot
restab <- merge(restab, unique(nicenames[, .(legendnames, color)]), by = "legendnames", all.x = T)
restab[legendnames == "secondary consumer arthropodsoillarvae", legendnames := "secondary consumer\narthropodsoillarvae"]
setorder(restab, cols = "avg_across_models")
restab[, legendnames := factor(legendnames, levels = restab$legendnames)]
pw <- ggplot(data = restab, aes(x = legendnames, y = avg_across_models, fill = color)) +
  geom_bar(stat = "identity", color = "black") + 
  coord_flip() +
  scale_fill_identity() +
  ylab("averages across main models") + xlab("") +
  labs(title = "Weighted average effect size \nacross main models",
      caption = "Weighted average across 2 main models EFturn and EFnes.
      (1) Calculated weighted average effect size across 9 thresholds of 
          both turnover and nestedness models (including se) 
      (2) Weighted average of the turnover and nestedness value
          weighted by sd.")
pw

Average ranks

Average ranks

ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv"))
# reduce to max values
restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ]
restab[, value_ID := NULL]
restab[, xaxis := NULL]
restab <- restab[!model_name %in% c("weighted_mean_turnover", "weighted_mean_nestedness"), ]# exclude average models

# filter biotic effects
# add together turnover and nestedness effects
# sum turnover and nestedness of each model
restab1 <- aggregate(restab[type == "bio", ], value ~ model_name + legendnames, FUN = sum)
restab2 <- aggregate(restab[type == "bio", ], sd ~ model_name + legendnames, FUN = function(x) sqrt(sum(x^2)))
restab <- data.table(merge(restab1, restab2, by = c("model_name", "legendnames")))
rm(restab1, restab2)

# calculate ranks per model_name. Higher ranks correspond to higher effect sizes
setDT(restab)[, ranks := rank(value), by = list(model_name)]
# average ranks across models
restab <- data.table(aggregate(restab, ranks ~ legendnames, FUN = mean))

# convert to plotting dataset
restab <- merge(restab, unique(nicenames[, .(legendnames, color)]), by = "legendnames", all.x = T)
restab[legendnames == "secondary consumer arthropodsoillarvae", legendnames := "secondary consumer\narthropodsoillarvae"]
setorder(restab, cols = "ranks")
restab[, legendnames := factor(legendnames, levels = restab$legendnames)]
pr <- ggplot(data = restab, aes(x = legendnames, y = ranks, fill = color)) +
  geom_bar(stat = "identity", color = "black") +
  coord_flip() +
  scale_fill_identity() +
  ylab("averages across main models") + xlab("") +
  labs(title = "Average rank across \nEFturn and EFnes models",
              caption = "Average ranks across 18 main models EFturn and EFnes.\n
       not weighted by sd.")
pr

Compare different synthesis plots

plot_grid(p, pw, pr, ncol = 3)

Decision notes

If average, weighted average or average ranks are taken makes a big difference. We go with average ranks, because : average ranks give a similar weight to all effects, while averaging effect size (both weighted and unweighted) give more weight to very large effects. For the synthesis plot, we want to know if some groups continuously have a considerable effect.



allanecology/multiFunBetadivLuiGDM documentation built on Nov. 12, 2023, 6:16 a.m.