cluster_scripts_documentation.Rmd
in order to produce the input
datasets for this scriptPlease run the procedure described in
cluster_scripts_documentation.Rmd
, under plot Uncertainty : get sd of
maxspline values. The procedure consists of running the edited function
gdm::plotUncertainty_slim
on each of the thresholds 0.1-0.9 on the
models for EFturnover and EFnestedness. The output datasets are
"gdm_EFturnover_0.\<?>_LUI_uncertainty.Rds" (respectively for all
thresholds, and for EFnestedness as well.)
# dummy code to test if output can be generated in principle # test <- plotUncertainty_slim(gdminput, sampleSites = 0.3, bsIters = 5, geo = T, cores = 2)
Requirements
library(corrplot)
Aim of this section : assess the quality of the uncertainty calculation which is
done in cluster_scripts_documentation.Rmd
. Are the datasets complete?
Do the values make sense?
Read input data
# Load requirements from <..>nonpublic.R file sections_to_be_loaded <- c("thresholds") # sections_to_be_loaded <- c("thresholds", "results", "gdminput", "gdmoutput") #TODO : model_results.csv wants to be read, but doesn't exist. Put new name source("vignettes/analysis_nonpublic.R") theme_set(theme_half_open())
The list ispline_uncertainty_all_thresholds
contains the deviance of
each ispline separately. Elements of the list are named accordingly.
# Quality check of cluster stuff par(mfrow = c(1, 2)) # put07 <- ispline_uncertainty_all_thresholds[["gdm_EFturnover_0.7"]] put01 <- ispline_uncertainty_all_thresholds[["gdm_EFturnover_0.1"]] # plot to see if makes sense # PLOT 1 # autotroph sim plot(put01$autotroph.beta.sim_fullModel_X, put01$autotroph.beta.sim_fullModel_Y, type = "l", ylim = c(-0.1, 1)) lines(put01$autotroph.beta.sim_fullModel_X, put01$autotroph.beta.sim_minusSD_Y) lines(put01$autotroph.beta.sim_fullModel_X, put01$autotroph.beta.sim_plusSD_Y) # geodist plot(put01$Geographic_fullModel_X, put01$Geographic_fullModel_Y, type = "l", ylim = c(-0.1, 1)) lines(put01$Geographic_fullModel_X, put01$Geographic_minusSD_Y) lines(put01$Geographic_fullModel_X, put01$Geographic_plusSD_Y) # PLOT 2 # LUI plot(put01$LUI_fullModel_X, put01$LUI_fullModel_Y, type = "l", ylim = range(put01$LUI_fullModel_Y) + c(-0.02, 0.05)) polygon(x = c(put01$LUI_fullModel_X, rev(put01$LUI_fullModel_X)), y = c(put01$LUI_minusSD_Y, rev(put01$LUI_plusSD_Y)), col = "gray", border = NA) lines(put01$LUI_fullModel_X, put01$LUI_fullModel_Y)
X values should be the same everywhere
plot(put01$Geographic_fullModel_X, put01$autotroph.beta.sim_fullModel_X) m <- cor(put01[, grep("fullModel_X", names(put01))], method = "spearman") corrplot(m, tl.cex = 0.2, type = "lower", method = "number", number.cex = 0.3)
average effects over all thresholds weighted by ispline sd
from Gossner 2016: - We also calculated a bootstrapped P value for each term in the full GDM, using the gdm.varImp function in the gdm library (Supplementary Table 5-2). Additionally we estimated uncertainty for the GDM plots by using 100 bootstraps for each model, each time removing 30% of the plot pairs and then fitting a GDM and extracting the predictions. We then calculated the s.d. of the predictions and added this (±-) to the fitted line (Extended Data Fig. 6).
runned permutations on the cluster with following parameters :
plotUncertainty_getsd(gdm_model, sampleSites = 0.3, bsIters = 100, geo = T, splineCol = "black", cores = 2)
results are stored at :
planteco/.../BetaDivMultifun/analysis/output_datasets/uncertainty_calc/
Calculate weighted average over all models. The GDM models are not all of the same quality. They are weighted based on the uncertainty of each ispline ($w$).
$$\mu = wa = \frac{e_{1}w_{1} + e_{2}w_{2} + ... + e_{9}w_{9}}{w_{1}+w_{2} + ... + w_{9}} = \frac{\sum_{i=1}^{n} e_{n}\frac{1}{sd_{n}}}{\sum_{i=1}^{n} \frac{1}{sd_{n}}} ; w_{n} = \frac{1}{sd_{n}} $$ where the estimate $e_{n}$ is weighted by its uncertainty $w_{n}$.
The weighted standard deviation is calculated as : $$w_sd = \sqrt{\sum_{i = 1}^{n} w_i (x_i - \mu)}$$
For each Predictor : - get X values : are the same in all models --> merge by this value - get Y values : model specific _minusSD_Y, fullModel_Y and _plusSD_Y
# names(ispline_uncertainty_all_thresholds) # names of the list correspond to model names # To each element of the list (each data frame), add a column with the respective model name # use workaround : lapply only gives you the elements of the vector you pass it. # The usual work-around is to pass it the names or indices of the vector instead # of the vector itself. # The list is assigned to an additional argument of the function. i corresponds to the unassigned # function argument, which is the sequ_along(). ispline_uncertainty_all_thresholds <- lapply(seq_along(ispline_uncertainty_all_thresholds), function(y, n, i) {y[[i]]$model_name <- n[i]; y[[i]]$value_ID <- seq(1, 8350); return(y[[i]])}, y = ispline_uncertainty_all_thresholds, n = names(ispline_uncertainty_all_thresholds)) # pile all datasets on top of each other ispline_uncertainty_all_thresholds <- rbindlist(ispline_uncertainty_all_thresholds, use.names = T) # filter out the relevant columns : exclude x uncertainties relevant_column_names <- unique(c("model_name", "value_ID", grep("_fullModel_X|_minusSD_Y|fullModel_Y|_plusSD_Y|SDn", names(ispline_uncertainty_all_thresholds), value = T) )) ispline_uncertainty_all_thresholds <- ispline_uncertainty_all_thresholds[, ..relevant_column_names] rm(relevant_column_names) # filter relevant x values # 8350 are just too many x values. Is not more correct if I have more values. # take around 334 values, plus the max value. relevant_rowindices <- c(seq(1, 8350, by = 25), 8350) ispline_uncertainty_all_thresholds <- ispline_uncertainty_all_thresholds[value_ID %in% relevant_rowindices, ] #TODO rename the rownames to 1 : 335 again --> easier to handle! # # # # quality control # plot individual predictor of one model ggplot(ispline_uncertainty_all_thresholds[model_name == "gdm_EFturnover_0.2", .(autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y)], aes(x = autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y)) + geom_line() ggplot(ispline_uncertainty_all_thresholds[model_name == "gdm_EFnestedness_0.5", .(autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y)], aes(x = autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y)) + geom_line() # plot individual predictor of all models ggplot(ispline_uncertainty_all_thresholds, aes(x = autotroph.beta.sne_fullModel_X, autotroph.beta.sne_fullModel_Y, col = model_name)) + geom_line() ggplot(ispline_uncertainty_all_thresholds, aes(x = autotroph.beta.sim_fullModel_X, autotroph.beta.sim_fullModel_Y, col = model_name)) + geom_line() # save output # saveRDS(ispline_uncertainty_all_thresholds, file = paste0(pathtoout, "/allmod_LUI_uncertainty.Rds"))
Calculate $w$ for each value of X for each predictor for each model individually. In other words, calculate how wide the confidence interval is at a given place in X for a given predictor and model. Speciality : All models of the same predictor have the same X values, which are identified with value_ID.
Calculate $w$ : $\text{minusSD_Y} = \mu - \sigma$ and $\text{plusSD_Y} = \mu + \sigma$ . Therefore : $\mu + \sigma - (\mu - \sigma) = \mu + \sigma - \mu + \sigma = 2\sigma$ And thus : $w = \sigma = \frac{\text{plusSD_Y} - \text{minusSD_Y}}{2}$
# ispline_uncertainty_all_thresholds <- readRDS(paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty.Rds")) ispline_uncertainty_all_thresholds <- melt(ispline_uncertainty_all_thresholds, id.vars = c("model_name", "value_ID")) # create helper dataset of N ispline_iterations <- unique(ispline_uncertainty_all_thresholds[grep("SDn", variable), .(model_name, variable, value)]) ispline_iterations[, predictor := sub("_SDn", "", variable)] setnames(ispline_iterations, old = "value", new = "SDn") ispline_iterations[, variable := NULL] # clean ispline_uncertainty table ispline_uncertainty_all_thresholds[, predictor := sub("_[a-zA-Z]+_[XY]$", "", variable, perl = T)] ispline_uncertainty_all_thresholds[, predictor := sub("_SDn", "", predictor)] ispline_uncertainty_all_thresholds[, Variable_type := sub("soil_|isolation_", "", sub("^[a-zA-Z.]+_", "", variable, perl = T), perl = T)] ispline_uncertainty_all_thresholds[, variable := NULL] ispline_uncertainty_all_thresholds[grep("X", ispline_uncertainty_all_thresholds$Variable_type), XY := "X"] ispline_uncertainty_all_thresholds[grep("Y", ispline_uncertainty_all_thresholds$Variable_type), XY := "Y"] ispline_uncertainty_all_thresholds[, Model_type := "nestedness"] ispline_uncertainty_all_thresholds[grep("turnover", model_name), Model_type := "turnover"] # 1. create dataset with X values which will be used to get the rest Xtest <- ispline_uncertainty_all_thresholds[XY == "X", ] setnames(Xtest, old = "value", new = "Xvalue") Xtest <- Xtest[, .(model_name, Model_type, predictor, Xvalue, value_ID)] # 2. merge the Y values to this one ispline_uncertainty_all_thresholds <- merge(Xtest, ispline_uncertainty_all_thresholds[XY == "Y", .(model_name, Model_type, predictor, value_ID, value, Variable_type)], by = c("model_name", "predictor", "value_ID", "Model_type")) rm(Xtest) ispline_uncertainty_all_thresholds <- dcast(ispline_uncertainty_all_thresholds, model_name + Model_type + predictor + value_ID + Xvalue ~ Variable_type) # # # # # # Calculate the weight SD_Y # # # # # # (1) CALCULATE UNCERTAINTY (SD) ispline_uncertainty_all_thresholds[, sd := (plusSD_Y - minusSD_Y)/2] range(ispline_uncertainty_all_thresholds$sd) par(mfrow = c(1,2)) hist(ispline_uncertainty_all_thresholds$sd, xlim = c(0, 0.95), main = "distribution of sd\n along all predictors and values") # # # CONSIDERATIONS # # hist(ispline_uncertainty_all_thresholds$sd, xlim = c(0, 0.95), ylim = c(0, 4000)) # hist(ispline_uncertainty_all_thresholds$sd, xlim = c(0, 0.95), ylim = c(0, 300), main = "") # # save as "GDM_multifun_thresholds_hist_all_sd.pdf" # dev.off() # # plotting only at value max # par(mfrow = c(1,2)) # hist(ispline_uncertainty_all_thresholds[value_ID == 8350, sd], xlim = c(0, 0.95), ylim = c(0, 300), # main = "distribution of sd\n along all predictors\nat maximum") # hist(ispline_uncertainty_all_thresholds[value_ID == 8350, sd], xlim = c(0, 0.95), ylim = c(0, 60), main = "") # # save as "GDM_multifun_thresholds_hist_all_sd_at_maximum.pdf" # dev.off() # # plotting only at min value # par(mfrow = c(2,2)) # hist(ispline_uncertainty_all_thresholds[value_ID == 1, sd], ylim = c(0, 100), # main = "distribution of sd along all predictors\nat min, of total 335") # hist(ispline_uncertainty_all_thresholds[value_ID == 76, sd], ylim = c(0, 100), # main = "distribution of sd along all predictors\nat 4th Xvalue, of total 335") # hist(ispline_uncertainty_all_thresholds[value_ID == 2476, sd], ylim = c(0, 300), # main = "distribution of sd along all predictors\nat 100th Xvalue, of total 335") # hist(ispline_uncertainty_all_thresholds[value_ID == 4976, sd], ylim = c(0, 300), # main = "distribution of sd along all predictors\nat 200th Xvalue, of total 335") # # save as "GDM_multifun_thresholds_hist_all_sd_at_selected_xvalues.pdf" # dev.off() # # # at selected predictors, at 200th value # par(mfrow = c(2,2)) # hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor == "LUI" & Model_type == "turnover", sd], # main = "distribution of sd LUI turnover\nat 200th Xvalue, of total 335") # hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor == "autotroph.beta.sim" & Model_type == "turnover", sd], # main = "distribution of sd autotroph.beta.sim turnover\nat 200th Xvalue, of total 335") # hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor == "omnivore.protist.beta.sne" & Model_type == "turnover", sd], # main = "distribution of sd omnivore.protist.beta.sne turnover\nat 200th Xvalue, of total 335") # hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor == "Geographic" & Model_type == "turnover", sd], # main = "distribution of sd Geographic turnover\nat 200th Xvalue, of total 335") # # save as "GDM_multifun_thresholds_hist_all_sd_at_selected_xvalues_and_predictors.pdf" # dev.off() # # # # are the 0 sd cases all cases at yvalue = 0 and xvalue = 0? # hist(unique(ispline_uncertainty_all_thresholds[sd == 0, fullModel_Y])) # max(unique(ispline_uncertainty_all_thresholds[sd == 0, fullModel_Y])) # # no, but all at very low values < 0.0051 # # # do small sd occurr at small Y predictions? # ggplot(ispline_uncertainty_all_thresholds, aes(x = sd, y = fullModel_Y, color = interaction(Model_type, predictor))) + # geom_point() + # theme(legend.position = "none") # # one line is one model. # # within a model : correlation of around 1 for sd increases with yvalue # # save as "GDM_multifun_thresholds_hist_sd_increase_with_effectsize.pdf" # # # # # (2) HANDLING 0 CASES # the standard deviation is 0 in some cases: # only occurs at very low predictions < 0.0051. Most are 0, but not all. # we could replace the 0 and very small values with the second smallest value hist(ispline_uncertainty_all_thresholds$sd, xlim = c(0, 0.000001), ylim = c(0, 4000)) min(ispline_uncertainty_all_thresholds$sd[!(ispline_uncertainty_all_thresholds$sd == 0)]) # is way larger than the machine epsilon (e-12 vs e-16). # we could also use a 3-step approach : # a. normalise standard deviations to values between 0 and 1 # b. calculate the weights 1 / sd # and replace the Inf values with 1 (maximum weight) # c. normalise again # but also here, at one point we have to replace 0 or Inf values with # a number. Therefore, we go for the approach where 0 are replaced by epsilon. # ispline_uncertainty_all_thresholds[weight == 0, weight := .Machine$double.eps] ispline_uncertainty_all_thresholds[sd == 0, sd := min(ispline_uncertainty_all_thresholds$sd[!(ispline_uncertainty_all_thresholds$sd == 0)])] # # # # # # (3) CALCULATE WEIGHT ispline_uncertainty_all_thresholds[, weight := 1 / sd] # # # visualise # hist(ispline_uncertainty_all_thresholds$weight) # hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor == "LUI" & Model_type == "turnover", weight], # main = "distribution of unstandardised weight LUI turnover\nat 200th Xvalue, of total 335") # hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor == "autotroph.beta.sim" & Model_type == "turnover", weight], # main = "distribution of unstandardised weight autotroph.beta.sim turnover\nat 200th Xvalue, of total 335") range(ispline_uncertainty_all_thresholds$weight) # # # # # (4) NORMALISE WEIGHTS # # the normalisation needs to be done : # - per value_ID, because small values have smaller standard deviations. the models should get weights accordingly # - per predictor : because ranges differ across predictors # - per model type (turnover vs. nestedness) : because ranges can differ # normalise weights between 0 and 1 # per predictor # per valueID # per model type (turnover vs. nestedness) # each normalisation is a normalisation of 9 values nrow(ispline_uncertainty_all_thresholds) == 18 * 35 * 335 # do the normalisation NOT over all, but grouped by # note : no shifting to zero needed, because values are already shifted to 0. # found this nice data.table code, need to test if does what I want. setDT(ispline_uncertainty_all_thresholds)[, weight_norm := (weight - min(weight, na.rm = T)) / (max(weight, na.rm = T) - min(weight, na.rm = T)), by = list(predictor, value_ID, Model_type)] range(ispline_uncertainty_all_thresholds$weight_norm, na.rm = T) # between 0 and 1 # some caes lead to NaN : sd is everywhere the same. happens ONLY in fullmodelY = 0 cases. # deleteme <- ispline_uncertainty_all_thresholds[is.nan(weight_norm), ][order(predictor, value_ID, Model_type), .(predictor, value_ID, Model_type, fullModel_Y, sd, weight_norm, weight)] # nrow(ispline_uncertainty_all_thresholds[is.nan(weight_norm) & fullModel_Y != 0, ][order(predictor, value_ID, Model_type), .(predictor, value_ID, Model_type, fullModel_Y, sd, weight_norm, weight)]) # # replace NaN values with 1 ispline_uncertainty_all_thresholds[is.nan(weight_norm), weight_norm := 1] # # # CONSIDERATIONS # # is the range within one predictor within one valueID within one model type ok? # hist(ispline_uncertainty_all_thresholds[value_ID == 4976 & predictor == "LUI" & Model_type == "turnover", weight_norm], # main = "distribution of normalised weight LUI turnover\nat 200th Xvalue, of total 335") # hist(ispline_uncertainty_all_thresholds[value_ID == 2476 & predictor == "autotroph.beta.sne" & Model_type == "turnover", weight_norm], # main = "distribution of normalised weight autotroph.beta.sne turnover\nat 100th Xvalue, of total 335") # # # # # (5) CALCULATE WEIGHTED MEAN OF FULLMODEL Y # # calculate the weighted mean of fullModel_Y ispline_uncertainty_all_thresholds[, weighted_Y := fullModel_Y * weight_norm] setDT(ispline_uncertainty_all_thresholds)[, sum_of_weight := sum(weight_norm), by = list(predictor, value_ID, Model_type)] setDT(ispline_uncertainty_all_thresholds)[, sum_of_weighted_Y := sum(weighted_Y), by = list(predictor, value_ID, Model_type)] ispline_uncertainty_all_thresholds[, mean_Y := sum_of_weighted_Y/ sum_of_weight] ispline_uncertainty_all_thresholds[, c("sum_of_weight", "sum_of_weighted_Y", "weighted_Y") := NULL] # # # # # # (6) CALCULATE WEIGHTED SD OF FULLMODEL Y # ispline_uncertainty_all_thresholds[, weighted_xi_minus_mu := weight_norm * ((fullModel_Y - mean_Y)^2)] # sum and square root setDT(ispline_uncertainty_all_thresholds)[, weighted_SD_mean_Y := sqrt(sum(weighted_xi_minus_mu)), by = list(predictor, value_ID, Model_type)] # weighted SE # n is 9, because 9 models # every average is calculated by 9 weighted values ispline_uncertainty_all_thresholds[, weighted_SE_mean_Y := weighted_SD_mean_Y / sqrt(9)] ispline_uncertainty_all_thresholds[, weighted_xi_minus_mu := NULL] # # # # # (5) CALCULATE STANDARD ERROR # # NOTE : not everywhere 100 iterations were possible. Add SDn which gives the number of # iterations in a given model # Calculate SE = sd / sqrt(N) table(ispline_iterations$SDn) ispline_uncertainty_all_thresholds <- merge(ispline_uncertainty_all_thresholds, ispline_iterations, by = c("predictor", "model_name"), all = T) ispline_uncertainty_all_thresholds[, se := sd / sqrt(SDn)] # save output dataset # saveRDS(ispline_uncertainty_all_thresholds, file = paste0(pathtoout, "/allmod_LUI_uncertainty_weighted_avg.Rds")) # fwrite(ispline_uncertainty_all_thresholds, file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg.csv"))
Smoothing
#TODO smooth lines along line plot. Some have "ecken". Do same for sd!
# ispline_uncertainty_all_thresholds <- readRDS(paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg.Rds")) # ispline_uncertainty_all_thresholds <- fread(paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg.csv")) # Quality checks # with standard deviation ggplot(ispline_uncertainty_all_thresholds[predictor == "autotroph.beta.sne" & Model_type == "nestedness"], aes(x = Xvalue, y = fullModel_Y, col = model_name, fill = model_name)) + geom_line() + geom_ribbon(aes(ymin = minusSD_Y, ymax = plusSD_Y), alpha = 0.1, linetype = 0) + scale_fill_brewer(palette="Set1") + scale_color_brewer(palette="Set1") + coord_cartesian(ylim = c(-0.02, 0.1)) + xlab("") + ylab("") + geom_line(aes(x = Xvalue, y = mean_Y), linewidth = 1.5, color = "red") # with standard error ggplot(ispline_uncertainty_all_thresholds[predictor == "autotroph.beta.sne" & Model_type == "nestedness"], aes(x = Xvalue, y = fullModel_Y, col = model_name, fill = model_name)) + geom_line() + geom_ribbon(aes(ymin = fullModel_Y - se, ymax = fullModel_Y + se), alpha = 0.1, linetype = 0) + scale_fill_brewer(palette="Set1") + scale_color_brewer(palette="Set1") + coord_cartesian(ylim = c(-0.02, 0.1)) + xlab("") + ylab("") + geom_line(aes(x = Xvalue, y = mean_Y), linewidth = 1.5, color = "red") ggplot(ispline_uncertainty_all_thresholds[predictor == "LUI" & Model_type == "nestedness"], aes(x = Xvalue, y = fullModel_Y, col = model_name, fill = model_name)) + geom_line() + scale_fill_brewer(palette="Set1") + scale_color_brewer(palette="Set1") + coord_cartesian(ylim = c(-0.02, 0.1)) + xlab("") + ylab("") + geom_line(aes(x = Xvalue, y = mean_Y), linewidth = 1.5, color = "red") # ggplot(ispline_uncertainty_all_thresholds[predictor == "plot_isolation" & Model_type == "nestedness",], aes(x = Xvalue, y = fullModel_Y, col = model_name, fill = model_name)) + geom_line() + scale_fill_brewer(palette="Set1") + scale_color_brewer(palette="Set1") + coord_cartesian(ylim = c(-0.02, 0.1)) + geom_line(aes(x = Xvalue, y = mean_Y), linewidth = 1.5, color = "red") + xlab("") + ylab("") #
Structured plotting
Match with plotting table nicenames
setnames(ispline_uncertainty_all_thresholds, old = "predictor", new = "names") # reshape table : bring mean to it's own row test <- melt(ispline_uncertainty_all_thresholds, measure.vars = c("fullModel_Y", "mean_Y", "weighted_SD_mean_Y", "weighted_SE_mean_Y"), id.vars = c("model_name", "Model_type", "value_ID", "names", "Xvalue") ) test[variable == "mean_Y", model_name := paste0("weighted_mean_", Model_type)] test[variable == "weighted_SD_mean_Y", model_name := paste0("weighted_mean_", Model_type)] test[variable == "weighted_SE_mean_Y", model_name := paste0("weighted_mean_", Model_type)] test <- unique(test[variable != "fullModel_Y", ]) test <- dcast(test, model_name + Model_type + value_ID + names + Xvalue ~ variable, value.var = "value") # add the source models again (fullModel_Y) setnames(test, old = c("mean_Y", "weighted_SD_mean_Y", "weighted_SE_mean_Y"), new = c("value", "sd", "se")) setnames(ispline_uncertainty_all_thresholds, old = "fullModel_Y", new = "value") test <- rbindlist(list(test, ispline_uncertainty_all_thresholds[, .(model_name, Model_type, value_ID, names, Xvalue, value, sd, se)]), use.names = T) nrow(test) == nrow(ispline_uncertainty_all_thresholds) + 335 * 35 * 2 # has 335 * 35 * 2 rows more. Should contain 234500 rows. ispline_uncertainty_all_thresholds <- merge(test, nicenames, by = c("names")) setnames(ispline_uncertainty_all_thresholds, old = c("Xvalue"), new = c("xaxis")) # add linewidth for mean model ispline_uncertainty_all_thresholds[, lwd := 1] ispline_uncertainty_all_thresholds[grep("weighted_mean", ispline_uncertainty_all_thresholds$model_name), lwd := 2] # save output dataset # saveRDS(ispline_uncertainty_all_thresholds, file = paste0(pathtoout, "/allmod_LUI_uncertainty_weighted_avg_plotdata.Rds")) # fwrite(ispline_uncertainty_all_thresholds, file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv"))
Plot each predictor individually, showing all model lines with uncertainty (standard error)
pdf("GDM_weighted_avg_over_thresholds_allmods.pdf", onefile = T, paper = "a4r") for(mtype in c("nestedness", "turnover")){ for(pred in unique(ispline_uncertainty_all_thresholds$names)){ p <- create_gdm_lineplot_predictorwise_thresholds( data = ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, ], ribbon = T, ribbontype = "se", legend = T, plottitle = paste(pred, mtype), ymax = min(1, max(ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, value + se], na.rm = T), na.rm = T) ) plot(p) } } dev.off() pdf("GDM_weighted_avg_over_thresholds_allmods_printable.pdf", onefile = T, paper = "a4r") for(mtype in c("nestedness", "turnover")){ for(pred in unique(ispline_uncertainty_all_thresholds$names)){ p <- create_gdm_lineplot_predictorwise_thresholds( data = ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, ], ribbon = T, ribbontype = "se", legend = F, plottitle = paste(pred, mtype), ymax = min(1, max(ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, value + se], na.rm = T), na.rm = T) ) plot(p) } } # add legend p <- create_gdm_lineplot_predictorwise_thresholds( data = ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, ], ribbon = T, ribbontype = "se", legend = T, plottitle = paste(pred, mtype), ymax = min(1, max(ispline_uncertainty_all_thresholds[Model_type == mtype & names == pred, value + se], na.rm = T), na.rm = T) ) ggdraw(get_legend(p)) dev.off()
Nestedness and Turnover : manually edit
lineplot_data <- ispline_uncertainty_all_thresholds[model_name == "weighted_mean_nestedness"] p <- create_gdm_lineplot(lineplot_data[ground == "a"], legend = F, ymax = max(lineplot_data$value), ribbon = F) q <- create_gdm_lineplot(lineplot_data[ground == "b"], legend = F, ymax = max(lineplot_data$value), ribbon = F) # abiotic subplots # LUI luidat <- lineplot_data[names %in% "LUI",] deltaluidat <- lineplot_data[names %in% "deltaLUI",] l <- create_gdm_lineplot(luidat, legend = F, ymax = max(lineplot_data$value), ribbon = F) dl <- create_gdm_lineplot(deltaluidat, legend = F, ymax = max(lineplot_data$value), ribbon = F) lgeo <- create_gdm_lineplot(lineplot_data[names %in% c("Geographic")], legend = F, ymax = max(lineplot_data$value), ribbon = F) lsoil <- create_gdm_lineplot(lineplot_data[names %in% c("edis_soil")], legend = F, ymax = max(lineplot_data$value), ribbon = F) lisol <- create_gdm_lineplot(lineplot_data[names %in% c("plot_isolation")], legend = F, ymax = max(lineplot_data$value), ribbon = F) # create legends biotic_legend <- get_nice_legend(type = "biotic") biotic_legend <- plot_grid(NULL, NULL, NULL, NULL, ggdraw(biotic_legend), NULL, rel_widths = c(0.1, 0.8, 0.1), rel_heights = c(0.01, 0.99)) abiotic_legend <- get_nice_legend(type = "abiotic") abiotic_legend <- plot_grid(NULL, NULL, NULL, NULL, ggdraw(abiotic_legend), NULL, rel_widths = c(0.1, 0.8, 0.1), rel_heights = c(0.01, 0.99)) # combine plots # create classic lineplot lin <- plot_grid( plot_grid(p, q, l, labels = c('A', 'B', 'C'), nrow = 3), plot_grid(biotic_legend, abiotic_legend, nrow = 2, rel_heights = c(0.7, 0.3)) ) lin_abiotic <- plot_grid(dl, lgeo, lsoil, lisol, labels = c("D", "", "", ""), align = T, nrow = 1) all_lines <- plot_grid(lin, lin_abiotic, nrow = 2, rel_heights = c(0.8, 0.2)) # save as A4 pdf: # "GDM_multifun_thresholds_lineplot_allpredictor_nestedness.pdf" # "GDM_multifun_thresholds_lineplot_allpredictor_turnover.pdf"
Many comparisons are done at max height --> better overview.
Dataset can be (1) reduced to max values, and (2) overviewbars are calculated.
(1) reduce to max values
Create dataset restab
ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv")) # reduce to max values restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ] restab[, value_ID := NULL] restab[, xaxis := NULL] # keep average models restab <- restab[model_name %in% c("weighted_mean_turnover", "weighted_mean_nestedness"), ] # 70 rows left, because 35 predictors in 2 models # reformat table : # model results of turnover and nestedness average models should be found in separate columns restab <- data.table::dcast( data = restab, formula = paste(paste(names(nicenames), collapse = " + "),"~ model_name"), value.var = c("value", "se")) # rename value column to model name names(restab)[grep("value", names(restab))] <- sub("value_", "", grep("value", names(restab), value = T))
(2) Calculate overviewbars
Dataset p_ov_allthres
is produced
a <- create_single_funs_overviewbars(restab2 = restab, rel_colnames = c("weighted_mean_turnover","weighted_mean_nestedness")) # bring to ggplot format ov_ab_allthresholds <- melt(a$above_below, id.vars = c("ground", "color")) ov_tn_allthresholds <- melt(a$turnover_nestedess, id.vars = c("component", "color")) # combine to shared datset ov_ab_allthresholds[, overviewbar_type := "sum_above_below"] ov_tn_allthresholds[, overviewbar_type := "sum_turn_nes"] ov_allthresholds <- rbindlist(list(ov_ab_allthresholds, ov_tn_allthresholds), use.names = T, fill = T) rm(ov_ab_allthresholds); rm(ov_tn_allthresholds) # add id variable ov_allthresholds[, mod_ov_id := paste(variable, overviewbar_type, sep = "_")] # create plots p_ov_allthres <- create_single_funs_overviewbar_plot(singleF_restab = ov_allthresholds, pos = "stack", xvar_name = "mod_ov_id") # (earlier) saved as : <date>_GDM_multifun_thresholds_allthresholds_overview_bars<_dodge> #TODO potentially delete because not used?
# produce barplots for(i in c("weighted_mean_nestedness", "weighted_mean_turnover")){ p <- create_bio_aboveground_barplot(type = "stacked", # type in c("stacked", "grouped") yvar_name = i, plottitle = "gdm_EFnestedness_weighted_mean", errorbar = F) b <- create_bio_belowground_barplot(type = "stacked", yvar_name = i) q <- create_abio_barplot(type = "stacked", yvar_name = i) # produce overviewbars ov <- create_single_funs_overviewbar_plot(ov_allthresholds[variable == i, ], pos = "stack", xvar_name = "overviewbar_type") + coord_flip() ovL <- create_single_funs_overviewbar_plot(ov_allthresholds[variable == i, ], pos = "stack", xvar_name = "overviewbar_type", legend = T) multipanel_with_overview_bars <- plot_grid(p, b, q, ov, labels = c('A', '', 'B', 'C'), label_size = 12, nrow = 4, rel_heights = c(0.22, 0.32, 0.17, 0.13), align = "v") # 2 summary bars # save ggsave(filename = paste(pathtoout, "/GDM_multifun_thresholds_", i, "_barplot.pdf", sep = ""), plot = multipanel_with_overview_bars, device=cairo_pdf, width = 11.96, height = 8.27, units = "in", dpi = 900) }
show standard errors
for(i in c("weighted_mean_nestedness", "weighted_mean_turnover")){ p <- create_bio_aboveground_barplot(type = "grouped", # type in c("stacked", "grouped") yvar_name = i, plottitle = "gdm_EFnestedness_weighted_mean", errorbar = T, pylim = c(0, 0.44)) b <- create_bio_belowground_barplot(type = "grouped", yvar_name = i, errorbar = T, pylim = c(0, 0.44)) q <- create_abio_barplot(type = "grouped", yvar_name = i, errorbar = T, pylim = c(0, 0.44)) # produce overviewbars ov <- create_single_funs_overviewbar_plot(ov_allthresholds[variable == i, ], pos = "stack", xvar_name = "overviewbar_type") + coord_flip() ovL <- create_single_funs_overviewbar_plot(ov_allthresholds[variable == i, ], pos = "stack", xvar_name = "overviewbar_type", legend = T) multipanel_with_overview_bars <- plot_grid(p, b, q, ov, labels = c('A', '', 'B', 'C'), label_size = 12, nrow = 4, rel_heights = c(0.24, 0.34, 0.11, 0.105), align = "v") # 2 summary bars # save ggsave(filename = paste(pathtoout, "/GDM_multifun_thresholds_", i, "_barplot_se.pdf", sep = ""), plot = multipanel_with_overview_bars, device=cairo_pdf, width = 11.96, height = 8.27, units = "in", dpi = 900) }
ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv")) # reduce to max values restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ] restab[, value_ID := NULL] restab[, xaxis := NULL] # reformat table : # model results in separate columns restab <- data.table::dcast( data = restab, formula = paste(paste(names(nicenames), collapse = " + "),"~ model_name"), value.var = c("value"))
All models including average
include <- c("nicenames", names(restab)[grep("turnover|nestedness", names(restab), perl = T)]) pal <- RColorBrewer::brewer.pal(9, "YlOrRd") df <- restab[ground == "a", ] df <- data.table::melt(df[, ..include], id.vars = c("nicenames")) df[value == 0, value := NA] # 0 values are set to NA to make them appear grey in the plot a <- ggplot(data = df, aes(x = variable, y = nicenames, fill = value)) + geom_tile() + scale_fill_gradientn(colours = pal, na.value = "grey80", limits = c(0, 0.35), labels = c(0, 0.1, 0.2, 0.3, 0.35)) + theme(axis.text.x = element_text(angle = 90, vjust = 1, size = 9, hjust = 1), plot.margin = margin(l = 50, r = 20), axis.line.x=element_blank(), axis.text.y = element_text(size=9, angle = 0), axis.ticks.x = element_blank(), panel.grid.major.x = element_line(color = "grey"), axis.title = element_blank(), legend.position = c(-0.11, 1.2)) + scale_x_discrete(position = "top") + labs(fill = "") + guides(fill = guide_colourbar(ticks.colour = "black", frame.colour = "black")) # aL <- get_legend(a) # a <- a + theme(legend.position = "none") df <- restab[ground == "b", ] df <- data.table::melt(df[, ..include], id.vars = c("nicenames")) df[value == 0, value := NA] b <- ggplot(data = df, aes(x = variable, y = nicenames, fill = value)) + geom_tile() + scale_fill_gradientn(colours = pal, na.value = "grey80") + theme(legend.position = "none", axis.title = element_blank(), axis.text.y = element_text(size=9, angle = 0), plot.margin = margin(l = 50, r = 20), axis.text.x = element_blank(), axis.line.x=element_blank(), axis.ticks.x = element_blank(), panel.grid.major.x = element_line(color = "grey")) df <- restab[ground == "x", ] df <- data.table::melt(df[, ..include], id.vars = c("nicenames")) df[value == 0, value := NA] c <- ggplot(data = df, aes(x = variable, y = nicenames, fill = value)) + geom_tile() + scale_fill_gradientn(colours = pal, na.value = "grey80") + theme(legend.position = "none", axis.title = element_blank(), axis.text.y = element_text(size=9, angle = 0), plot.margin = margin(l = 50, r = 20), axis.text.x = element_blank(), axis.line.x=element_blank(), axis.ticks.x = element_blank(), panel.grid.major.x = element_line(color = "grey")) heatmap <- plot_grid(a, b, c, labels = c('A', '', 'B'), label_size = 12, nrow = 3, rel_heights = c(0.52, 0.39, 0.09), align = "v") ggsave(filename = paste(pathtoout, "/plot_GDM_thresholds_heatmap.pdf", sep = ""), plot = heatmap, device = cairo_pdf, width = 11.96, height = 8.27, units = "in", dpi = 900)
clustered heatmap without average models
include <- include[-which(include %in% c("weighted_mean_nestedness", "weighted_mean_turnover"))] restab1 <- restab[, ..include] restab1 <- as.matrix(restab1[, -1]) rownames(restab1) <- restab$nicenames colnames(restab1) <- gsub("EFnestedness_", "NES ", gsub("EFturnover_", "TO ", gsub("gdm_", "", gsub("_LUI", "", colnames(restab1))))) heatmap(restab1, hclustfun = hclust) # manually saved as "plot_GDM_thresholds_heatmap_cluster.pdf"
Create a plot with average effect sizes across the 2 main models : EFturnover_WEIGHTED_mean, EFnestedness_WEIGHTED_mean.
Average across 2 main models. (remember the 2 main models are weighted averages of all models)
ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv")) # reduce to max values restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ] restab[, value_ID := NULL] restab[, xaxis := NULL] # keep average models restab <- restab[model_name %in% c("weighted_mean_turnover", "weighted_mean_nestedness"), ] # 70 rows left, because 35 predictors in 2 models # filter biotic effects # add together turnover and nestedness effects # sum turnover and nestedness of each model restab <- aggregate(restab[type == "bio", ], value ~ model_name + legendnames, FUN = sum) # quality check plot(as.factor(restab$model_name), restab$value) # are in same range # average restab <- data.table(aggregate(restab, value ~ legendnames, FUN = mean)) # sort and plot restab <- merge(restab, unique(nicenames[, .(legendnames, color)]), by = "legendnames", all.x = T) restab[legendnames == "secondary consumer arthropodsoillarvae", legendnames := "secondary consumer\narthropodsoillarvae"] setorder(restab, cols = "value") restab[, legendnames := factor(legendnames, levels = restab$legendnames)] p <- ggplot(data = restab, aes(x = legendnames, y = value, fill = color)) + geom_bar(stat = "identity", color = "black") + coord_flip() + scale_fill_identity() + ylab("averages across main models") + xlab("") + labs(title = "Average effect size \nacross main models", caption = "Average across 2 main models EFturn and EFnes.\n Both are weighted averages of effect sizes across the 9 thresholds.") p # saved as : "GDM_multifun_thresholds_synthesis_biotic_ranks.pdf" A5 landscape
Weighted averages of two main models.
ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv")) # reduce to max values restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ] restab[, value_ID := NULL] restab[, xaxis := NULL] # keep average models restab <- restab[model_name %in% c("weighted_mean_turnover", "weighted_mean_nestedness"), ] # 70 rows left, because 35 predictors in 2 models # filter biotic effects # add together turnover and nestedness effects # sum turnover and nestedness of each model restab1 <- aggregate(restab[type == "bio", ], value ~ model_name + legendnames, FUN = sum) restab2 <- aggregate(restab[type == "bio", ], sd ~ model_name + legendnames, FUN = function(x) sqrt(sum(x^2))) restab <- data.table(merge(restab1, restab2, by = c("model_name", "legendnames"))) rm(restab1, restab2) # calculate weighted means restab[, weighted_means := value * (1 / sd)] restab[, weight := 1 / sd] restab1 <- data.table(aggregate(restab, weighted_means ~ legendnames, FUN = mean)) restab2 <- data.table(aggregate(restab, weight ~ legendnames, FUN = sum)) restab1 <- merge(restab1, restab2, by = "legendnames") restab1[, avg_across_models := weighted_means / weight] # divide mean of weighted effects by sum of weights restab <- restab1[, .(legendnames, avg_across_models)] rm(restab1, restab2) # sort and plot restab <- merge(restab, unique(nicenames[, .(legendnames, color)]), by = "legendnames", all.x = T) restab[legendnames == "secondary consumer arthropodsoillarvae", legendnames := "secondary consumer\narthropodsoillarvae"] setorder(restab, cols = "avg_across_models") restab[, legendnames := factor(legendnames, levels = restab$legendnames)] pw <- ggplot(data = restab, aes(x = legendnames, y = avg_across_models, fill = color)) + geom_bar(stat = "identity", color = "black") + coord_flip() + scale_fill_identity() + ylab("averages across main models") + xlab("") + labs(title = "Weighted average effect size \nacross main models", caption = "Weighted average across 2 main models EFturn and EFnes. (1) Calculated weighted average effect size across 9 thresholds of both turnover and nestedness models (including se) (2) Weighted average of the turnover and nestedness value weighted by sd.") pw
Average ranks
ispline_uncertainty_all_thresholds <- fread(file = paste0(pathtodata, "/analysis/output_datasets/uncertainty_calc/allmod_LUI_uncertainty_weighted_avg_plotdata.csv")) # reduce to max values restab <- ispline_uncertainty_all_thresholds[value_ID == 8350, ] restab[, value_ID := NULL] restab[, xaxis := NULL] restab <- restab[!model_name %in% c("weighted_mean_turnover", "weighted_mean_nestedness"), ]# exclude average models # filter biotic effects # add together turnover and nestedness effects # sum turnover and nestedness of each model restab1 <- aggregate(restab[type == "bio", ], value ~ model_name + legendnames, FUN = sum) restab2 <- aggregate(restab[type == "bio", ], sd ~ model_name + legendnames, FUN = function(x) sqrt(sum(x^2))) restab <- data.table(merge(restab1, restab2, by = c("model_name", "legendnames"))) rm(restab1, restab2) # calculate ranks per model_name. Higher ranks correspond to higher effect sizes setDT(restab)[, ranks := rank(value), by = list(model_name)] # average ranks across models restab <- data.table(aggregate(restab, ranks ~ legendnames, FUN = mean)) # convert to plotting dataset restab <- merge(restab, unique(nicenames[, .(legendnames, color)]), by = "legendnames", all.x = T) restab[legendnames == "secondary consumer arthropodsoillarvae", legendnames := "secondary consumer\narthropodsoillarvae"] setorder(restab, cols = "ranks") restab[, legendnames := factor(legendnames, levels = restab$legendnames)] pr <- ggplot(data = restab, aes(x = legendnames, y = ranks, fill = color)) + geom_bar(stat = "identity", color = "black") + coord_flip() + scale_fill_identity() + ylab("averages across main models") + xlab("") + labs(title = "Average rank across \nEFturn and EFnes models", caption = "Average ranks across 18 main models EFturn and EFnes.\n not weighted by sd.") pr
Compare different synthesis plots
plot_grid(p, pw, pr, ncol = 3)
If average, weighted average or average ranks are taken makes a big difference. We go with average ranks, because : average ranks give a similar weight to all effects, while averaging effect size (both weighted and unweighted) give more weight to very large effects. For the synthesis plot, we want to know if some groups continuously have a considerable effect.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.