knitr::opts_chunk$set(echo = FALSE, collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) options(rmarkdown.html_vignette.check_title = FALSE)
The OBIC is a framework that takes a multitude of soil parameters and variables from agricultural fields and ultimately gives a single value expressing the soil quality of that field. To take this multitude of measured, modeled and calculated values to a single value between 0 and 1, three aggregation steps take place as illustrated below.
# include graphic knitr::include_graphics('../vignettes/OBIC_score_integratie_2.png')
There is no scientific principle dictating how this aggregation should be done and there are several ways to do the aggregation. For example; averaging, linearly weighted averaging, logarithmically weighted averaging. The last one is used in OBIC.
This document dives deeper into the three aggregation steps within the framework and will explain why logarithmically weighted aggregation is chosen. We will also compare the three methods of aggregation to illustrate the influence the aggregation method on the final score, which provides a kind of sensitivity analysis.
The aggregation methods will be compared with a mock dataset of a selection of soil functions.
Demonstration of aggregation will be performed using the dataset binnenveld
. The dataset contains soil properties from 11 agricultural fields in the surroundings of Wageningen, with different soil texture and land use, and is documented in ?binnenveld
.
# load packages library(OBIC); library(data.table); library(ggplot2); library(patchwork) setDTthreads(1) # load data dt <- OBIC::binnenveld[ID==1]
After calculating soil function scores and in prior to the aggregation procedure, a reformatting step takes place. The reformatting step consists of the following tasks:
weight.obic
table.# Step 3 Reformat dt given weighing per indicator and prepare for aggregation ------------------ # load weights.obic (set indicator to zero when not applicable) w <- as.data.table(OBIC::weight.obic) # Add years per field dt[,year := 1:.N, by = ID] # Select all indicators used for scoring cols <- colnames(dt)[grepl('I_C|I_B|I_P|I_E|I_M|year|crop_cat|SOILT',colnames(dt))] # Melt dt and assign main categories for OBI dt.melt <- melt(dt[,mget(cols)], id.vars = c('B_SOILTYPE_AGR','crop_category','year'), variable.name = 'indicator') # add categories relevant for aggregating # C = chemical, P = physics, B = biological, BCS = visual soil assessment # indicators not used for integrating: IBCS and IM dt.melt[,cat := tstrsplit(indicator,'_',keep = 2)] dt.melt[grepl('_BCS$',indicator) & indicator != 'I_BCS', cat := 'IBCS'] dt.melt[grepl('^I_M_',indicator), cat := 'IM'] # Determine number of indicators per category dt.melt.ncat <- dt.melt[year==1 & !cat %in% c('IBCS','IM')][,list(ncat = .N),by='cat'] # add weighing factor to indicator values dt.melt <- merge(dt.melt,w[,list(crop_category,indicator,weight_nonpeat,weight_peat)], by = c('crop_category','indicator'), all.x = TRUE) # calculate correction factor for indicator values (low values have more impact than high values, a factor 5) dt.melt[,cf := cf_ind_importance(value)] # calculate weighted value for crop category dt.melt[,value.w := value] dt.melt[grepl('veen',B_SOILTYPE_AGR) & weight_peat < 0,value.w := -999] dt.melt[!grepl('veen',B_SOILTYPE_AGR) & weight_nonpeat < 0,value.w := -999]
# YF: I think this paragraph is not necessary # After reformatting the data in step 3, an indicator data.table is created in step 4. This data.table uses the soil function scores adjusted for their applicability for the soiltype and crop category. In step 4 indicators are calculated to display them as output. In step 5 the total OBI score is calculated, since part of step 5 overlaps with step 4, we only discuss step 5 onwards.
To aggregate scores, the relevant columns and rows are taken from the molten data.table.
# Step 5 Add scores ------------------ # subset dt.melt for relevant columns only out.score <- dt.melt[,list(cat, year, cf, value = value.w)] # remove indicator categories that are not used for scoring out.score <- out.score[!cat %in% c('IBCS','IM','BCS')]
The indicators within each category are aggregated to a single score per category (chemical, physical, biological, management, environmental) using the correction factor (vcf) calculated previously using cf_ind_importance()
. This correction factor vcf gives a higher weight to indicators with a lower score, as:
$$
vcf = 1/(I+0.2)
$$
where I is the score of the indicator.
This way, the lowest indicator, supposedly also the most limiting factor for crop production, becomes more important. Consequently, improving a low scoring indicator by 0.1 has a greater impact on the aggregated category score than improving a high scoring indicator by the same amount, making it more worthwhile to invest in the poorest and most limiting indicator.
Subsequently, the score of each category is computed by summing up the weighted values (scaled by the sum of all weights) of all soil indicators within the category: $$ S = \sum_{i}(I_{i} \frac{vcf_{i}}{\sum_{i}vcf_{i}}) $$ where S is the score of the category, vcfi is the weighing factor of the indicator i, Ii is the score of the indicator i. This gives a single score for each of the five indicator categories (chemical, physical, biological, management, and environmental) for a specific year.
# calculate weighted average per indicator category out.score <- out.score[,list(value = sum(cf * pmax(0,value) / sum(cf[value >= 0]))), by = list(cat,year)] # for case that a cat has one indicator or one year and has NA out.score[is.na(value), value := -999]
To account for the entire crop rotation, OBIC aggregates scores of multiple years. For the aggregation over years, another correction factor ycf is used to give more weight to recent years on a logarithmic scale. OBIC set the maximum length of period as 10 years, as crop rotation in the Netherlands are hardly ever longer than 10 years. When data older than 10 years ago is used, then those years get the same weight as 10 years ago. ycf is formulated as: $$ ycf = ln(12 - min(y, 10)) $$ where y is the length of years before the assessment. y = 1 means the year for which the assessment is conducted for (i.e. the most recent year).
This gives the correction factors for a period of eleven years as follows (from the most recent years to 11 years before):
# create data y <- 1:11 cf <- log(12 - pmin(10, y)) cat(round(cf, 3))
The most recent year carries about r round(cf[1]/cf[10],digits =1)
times the weight of the tenth year. Notice that years ten and eleven have the same correction factor value, the minimum ycf value for a year is equal to that of year ten.
More priority (weight) is given to recent years because they better reflect the current situation. Additionally, changes in management or soil properties have a more visible effect on the scores in the recent years.
Aggregation of scores over years is done with the following two lines of code. This is analogue to the aggregation procedure within each category as described above (i.e. sum of weighted score, scaled by the sum of all weighing factors).
The aggregation procedure is coded in the following lines.
# calculate correction factor per year; recent years are more important out.score[,cf := log(12 - pmin(10,year))] # calculate weighted average per indicator category per year out.score <- out.score[,list(value = sum(cf * pmax(0,value)/ sum(cf[value >= 0]))), by = cat]
This gives us a single score for each of the five indicator categories (chemical, physical, biological, management, and environmental), without time dimension.
The scores of five indicator categories are aggregated to a single, holistic, OBI-score. The category scores are weighed logarithmically based on the number of indicators underlying the category. The number of indicators per category was retrieved previously with the line
dt.melt.ncat <- dt.melt[year==1 & !cat %in% c('IBCS','IM')][,list(ncat = .N),by='cat']
.
Now its merged with our score data.table.
# merge out with number per category out.score <- merge(out.score,dt.melt.ncat, by='cat')
The correction factor for each category, ccf, are computed based on the number of indicators as:
$$
ccf = ln(ncat+1)
$$
where ncat is the number of the underlying indicators within the category.
The weights for categories with 1 to 10 indicators are: r round(log(1:10 +1),2)
. Thus, a category based on 10 indicators affects the total score roughly r round(log(1:10 +1)[10]/log(1:10 +1)[1],1)
times more than a category based on 1 indicator. The idea behind giving more weight to categories with more underlying indicators sprouts from the idea that such a category is better supported by measurable data and better understood.
Finally, the total OBIC score is calculated by summing up the weighted scores of 5 categories and dividing it by the sum of the weighing factors, in the same way as the other 2 aggregation steps.
# this text is probably wrong # Furthermore, by aggregating indicators to categories and then to a score rather than directly from indicators to a score; individual indicators from categories with few underlying indicators, affect the holistic score more than indicators in categories with many indicators. For example, if there is one biological indicator, its weight in affecting the holistic score is log(1+1)= `r round(log(1+1),2)`, while a indicator within a chemical indicator with nine indicators individually only weighs log(9+1)/9= `r round(log(9+1)/9,2)`. While on category level, biology only weighs `r round(log(1+1),2)` and chemical `r round(log(9+1),2)`.
This aggregation procedure is coded in the following lines.
# calculate weighing factor depending on number of indicators out.score[,cf := log(ncat + 1)] # calculated final obi score out.score <- rbind(out.score[,list(cat,value)], out.score[,list(cat = "T",value = sum(value * cf / sum(cf)))])
After the aggregation there is just a bit of code to format the names of the scores.
# update element names out.score[,cat := paste0('S_',cat,'_OBI_A')] out.score[, value := round(value,3)]
# make mock data and calculate scores with different aggregation methods (averaging without weight and averaging with linearly changing weight) # data like: # soil_function_value|indicator|group|year|cf_base|cf_noweight|cf_linearweight|score_base|score_noweight|score_linearweight # visualise differences # make veldnr fieldid <- 1 # define standard deviation std <- 0.2 # make indicator inds <- c('I_C_CEC', 'I_C_CU', 'I_C_K', 'I_C_MG', 'I_C_N', 'I_C_P', 'I_C_PH', 'I_C_S', 'I_C_ZN', 'I_B_DI', 'I_B_SF', 'I_E_NGW', 'I_E_NSW', 'I_M', 'I_P_CEC', 'I_P_CO', 'I_P_CR', 'I_P_DS', 'I_P_DU', 'I_P_SE', 'I_P_WRI', 'I_P_WS') # make jaar year <- 1:10 # combine in dt dt <- data.table(field = sort(rep(fieldid,length(inds)*length(year))), indicator = sort(rep(inds, length(year)*length(fieldid))), year = rep(year, length(inds)*length(fieldid)) ) # add category dt <- dt[,cat := tstrsplit(indicator,'_',keep = 2)] # iteratively add fields dto <- data.table(field = NULL, indicator = NULL, year = NULL) for(i in 1:100){ dtn <- dt dtn <- dtn[,field := i] dto <- rbindlist(list(dto, dtn)) } # dto is a almost ready set of 100 fields, only values and value description need to be added set.seed(222) # helper function to make random values within 0:1 rtnorm <- function(n, mean = 0, sd = 1, min = 0, max = 1) { bounds <- pnorm(c(min, max), mean, sd) u <- runif(n, bounds[1], bounds[2]) qnorm(u, mean, sd) } # make baseline dt1 <- copy(dto) dt1 <- dt1[,field := field+100-1] dt1 <- dt1[,treatment := 'baseline'] dt1 <- dt1[,value := rtnorm(n = nrow(dt1),mean = 0.7, sd = std)] # make treatment where c = 0.3 dt2 <- copy(dto) dt2 <- dt2[,field := field+200-1] dt2 <- dt2[,treatment := 'low C values'] dt2 <- dt2[cat == 'C',value := rtnorm(n = nrow(dt2[cat=='C']),mean = 0.3, sd = std)] dt2 <- dt2[!cat == 'C',value := rtnorm(n = nrow(dt2[!cat=='C']),mean = 0.7, sd = std)] # make treatment where B = 0.3 dt3 <- copy(dto) dt3 <- dt3[,field := field+300-1] dt3 <- dt3[,treatment := 'low B values'] dt3 <- dt3[cat == 'B',value := rtnorm(n = nrow(dt3[cat=='B']),mean = 0.3, sd = std)] dt3 <- dt3[!cat == 'B',value := rtnorm(n = nrow(dt3[!cat=='B']),mean = 0.7, sd = std)] # make treatment where one C indicator = 0 dt4 <- copy(dto) dt4 <- dt4[,field := field+400-1] dt4 <- dt4[,treatment := 'one low C'] dt4 <- dt4[indicator == 'I_C_CEC',value := 0] dt4 <- dt4[!indicator == 'I_C_CEC',value := rtnorm(n = nrow(dt4[!indicator == 'I_C_CEC']),mean = 0.7, sd = std)] # make where one B indicator = 0 dt5 <- copy(dto) dt5 <- dt5[,field := field+500-1] dt5 <- dt5[,treatment := 'one low B'] dt5 <- dt5[indicator == 'I_B_DI',value := 0] dt5 <- dt5[!indicator == 'I_B_DI',value := rtnorm(n = nrow(dt5[!indicator == 'I_B_DI']),mean = 0.7, sd = std)] # make treatment where recent years score low and old years high dt6 <- copy(dto) dt6 <- dt6[,field := field+600-1] dt6 <- dt6[,treatment := 'Recent years low'] dt6 <- dt6[year %in% 1:5, value := rtnorm(n = nrow(dt6[year %in% 1:5]), mean = 0.3, sd = std)] dt6 <- dt6[!year %in% 1:5, value := rtnorm(n = nrow(dt6[!year %in% 1:5]), mean = 0.7, sd = std)] # make treatment where recent years score high and old years low dt7 <- copy(dto) dt7 <- dt7[,field := field+700-1] dt7 <- dt7[,treatment := 'Recent years high'] dt7 <- dt7[!year %in% 1:5, value := rtnorm(n = nrow(dt7[!year %in% 1:5]), mean = 0.3, sd = std)] dt7 <- dt7[year %in% 1:5, value := rtnorm(n = nrow(dt7[year %in% 1:5]), mean = 0.7, sd = std)] # combine all data dta <- rbindlist(list(dt1, dt2, dt3, dt4, dt5, dt6, dt7)) # make sure all values are between 0 and 1 if(any(dta$value >1|dta$value<0)){cat('values outside acceptable bounds')}
By now, we have some understanding of how measured soil function data are aggregated to an integral score within the current OBIC framework. So, now we can explore and reflect on some of the choices that were made in designing this aggregation process. The first choice we will reflect upon is that of the correction factors. In the OBIC framework, these are determined logarithmically but could also be determined linearly or not be used at all. Second, we will reflect on our choice of 2-step aggregation (i.e. first aggregated to categories and then aggregated to a holistic score),instead of aggregating indicators directly to a holistic score.
To reflect on alternative aggregation methods we have made a mock data.table similar to a data.table in the obic_field function just before aggregating scores. We will compare the aggregation methods with seven scenarios or treatments. The treatments are as follows:
Each treatment has a 100 replicates whose soil function scores are randomly drawn from a distribution with a standard deviation of approximately r std[1]
. The mean of the distributions depends on the scenario. All values are in the 0 to 1 range.
ggplot(dta, aes(x = value, fill = cat)) + geom_histogram(bins = 40) + theme_bw() +facet_wrap(~treatment, ncol = 4)
# get relevant data from dta dtat <- dta[, mean(value), by = c( 'cat', 'treatment')] # rounc V1 dtat <- dtat[, V1 := round(V1, digits = 3)] # improve category descrition dtat <- dtat[,cat := paste('mean', cat)] # dcast dtat <- dcast(dtat ,treatment~cat, value.var = 'V1') # factorise and order treatment dtat <- dtat[, treatment := factor(treatment, levels = c('baseline', 'low B values', 'low C values', 'one low B', 'one low C', 'Recent years high', 'Recent years low'))] # order dtat setorder(dtat) # make table knitr::kable(dtat, caption = 'Mean scores per category for each scenario')
To compare the influence of different correction factors, we conducted the three aggregation steps with three methods to compute a correction factor (cf):
The log and linear cf's are illustrated in Figure 2 in the range that they operate. value is the correction factor for the indicator, which ranges between 0 to 1. year is the correction factor for the year a measurement is from, 1 being the most recent year, 10 being ten years earlier. ncat is the correction for the number of soil indicators within a category. For Chemical indicators, this typically is r length(grep('I_C_', names(obic_field_dt(binnenveld[ID==1], output = 'indicators'))))
.
The slope of the linear correction factor is chosen such that the highest and lowest correction factor is identical between linear and logarithmic scale. In practice, another slope could be chosen for linear aggregation.
The no correction method is not presented in Figure 2 as it would be a horizontal line with an arbitrary value.
# plot correction factors pdtlog <- data.table(x = c(seq(0,1,0.1),rep(0:10,2)), cf_type = c(rep('value',11),rep('year',11), rep('ncat', 11))) # calc cf's log pdtlog <- pdtlog[cf_type == 'value', cf := OBIC::cf_ind_importance(x)] pdtlog <- pdtlog[cf_type == 'year', cf := log(12 - pmin(10,x))] pdtlog <- pdtlog[cf_type == 'ncat', cf := log(x + 1)] pdtlog[,cf_method := 'log'] # calc cf's linear pdtlin <- data.table(x = c(seq(0,1,0.1),rep(0:10,2)), cf_type = c(rep('value',11),rep('year',11), rep('ncat', 11))) pdtlin <- pdtlin[cf_type == 'value', cf := 5-4.17*x] pdtlin <- pdtlin[cf_type == 'year', cf := 2.59-0.19*x] pdtlin <- pdtlin[cf_type == 'ncat', cf := x*log(11)/10] pdtlin[,cf_method := 'linear'] # combine pdt <- rbindlist(list(pdtlog, pdtlin)) # format pdt pdt <- pdt[,cf_type := factor(cf_type, levels = c('value', 'ncat', 'year'))] # plot gg <- ggplot(pdt, aes(x = x, y = cf, color = cf_method, group = cf_type))+ geom_point() + theme_bw() + facet_wrap(~cf_type, ncol = 3, scales = 'free') + scale_colour_viridis_d()+ xlab('') + ylab('cf (weight)') # plot a line in each for(i in 1:uniqueN(pdt$cf_method)){ gg <- gg + geom_line(data = pdt[cf_method == unique(pdt$cf_method)[i]], color = c('#FDE725FF', '#440154FF')[i]) } # plot gg gg
# Use three ways to calc correction factors (giving weight to each value), log (standard in OBIC), lin (linearly increasing/decreasing), non (everything has the same weight) # value correction factors ====== dt <- copy(dta) # function to add correction factors addcf <- function(dt){ # copy input dt.int <- copy(dt) # add correction factor for indicator value dt.int[,log := OBIC::cf_ind_importance(value)] dt.int[,lin := 5-4.17*value] dt.int[,non := 1] # melt dt by cf method dt.int <- melt(dt.int, measure.vars = c('log', 'lin', 'non'), value.name = 'v_cf', variable.name = 'cf_method') # calculate cf for cat ==== dt.int[,ncat := .N,by=c('field','year','cat','cf_method')] # add correction factor for number of categories dt.int[cf_method == 'log',c_cf := log(ncat + 1)] dt.int[cf_method == 'lin',c_cf := ncat*log(10 + 1)/10] dt.int[cf_method == 'non',c_cf := 1] # dd correction factor for number of years dt.int[cf_method == 'log',y_cf := log(12 - pmin(10,year))] dt.int[cf_method == 'lin',y_cf := 2.59-0.19*year] dt.int[cf_method == 'non',y_cf := 1] } # add correction factors dt <- addcf(dt)
# Use three ways to calc correction factors (giving weight to each value), log (standard in OBIC), lin (linearly increasing/decreasing), non (everything has the same weight) # for log and lin, 4 scenarios are added (c_cf only, v_cf only, y-cf only, or all) # value correction factors ====== dtp <- copy(dta) # function to add correction factors addcf2 <- function(dt){ # copy input dt.int <- copy(dt) # add correction factor for indicator value dt.int[,log_all := OBIC::cf_ind_importance(value)] dt.int[,log_vcf := OBIC::cf_ind_importance(value)] dt.int[,log_ccf := 1] dt.int[,log_ycf := 1] dt.int[,lin_all := 5-4.17*value] dt.int[,lin_vcf := 5-4.17*value] dt.int[,lin_ccf := 1] dt.int[,lin_ycf := 1] dt.int[,non := 1] # melt dt by cf method dt.int <- melt(dt.int, measure.vars = c('log_all', 'log_vcf', 'log_ccf', 'log_ycf', 'lin_all', 'lin_vcf', 'lin_ccf', 'lin_ycf', 'non'), value.name = 'v_cf', variable.name = 'cf_method') # calculate cf for cat ==== dt.int[,ncat := .N,by=c('field','year','cat','cf_method')] # add correction factor for number of categories dt.int[cf_method == 'log_all',c_cf := log(ncat + 1)] dt.int[cf_method == 'log_ccf',c_cf := log(ncat + 1)] dt.int[cf_method == 'log_vcf',c_cf := 1] dt.int[cf_method == 'log_ycf',c_cf := 1] dt.int[cf_method == 'lin_all',c_cf := ncat*log(10 + 1)/10] dt.int[cf_method == 'lin_ccf',c_cf := ncat*log(10 + 1)/10] dt.int[cf_method == 'lin_vcf',c_cf := 1] dt.int[cf_method == 'lin_ycf',c_cf := 1] dt.int[cf_method == 'non',c_cf := 1] # dd correction factor for number of years dt.int[cf_method == 'log_all',y_cf := log(12 - pmin(10,year))] dt.int[cf_method == 'log_ycf',y_cf := log(12 - pmin(10,year))] dt.int[cf_method == 'log_vcf',y_cf := 1] dt.int[cf_method == 'log_ccf',y_cf := 1] dt.int[cf_method == 'lin_all',y_cf := 2.59-0.19*year] dt.int[cf_method == 'lin_ycf',y_cf := 2.59-0.19*year] dt.int[cf_method == 'lin_vcf',y_cf := 1] dt.int[cf_method == 'lin_ccf',y_cf := 1] dt.int[cf_method == 'non',y_cf := 1] return(dt.int) } # add correction factors dt3 <- addcf2(dtp)
# make function to aggregate scores aggscores <- function(dt) { # copy input dt.agg <- copy(dt) # calculate weighted value per category and year dt.agg <- dt.agg[,list(w.value = sum(v_cf* pmax(0,value) / sum(v_cf[value >= 0])), y_cf = mean(y_cf), c_cf = mean(c_cf)), by = .(treatment,field, cf_method, cat, year)] # calculated weighted average value per category (so mean over years) dt.agg <- dt.agg[,list(wy.value = sum(y_cf * pmax(0, w.value) / sum(y_cf[w.value >= 0])), c_cf = mean(c_cf)), by = .(treatment,field, cf_method, cat)] # calculated weighted average value per field (so mean over categories) dt.agg.tot <- dt.agg[,list(value = sum(wy.value * c_cf / sum(c_cf))), by = .(treatment,field, cf_method)] # output dt.out <- rbind(dt.agg[,.(treatment,field,cf_method,cat,value = wy.value)], dt.agg.tot[,.(treatment,field,cf_method,cat='total',value)]) } # add scores to dt dt.out <- aggscores(dt) dt3.out <- aggscores(dt3)
aggscores_brent <- function(dt) { # copy input dt.agg <- copy(dt) # calculate weighted value per category and year dt.agg <- dt.agg[,w.value := sum(v_cf* pmax(0,value) / sum(v_cf[value >= 0])), by = .(field, cf_method, cat, year)] # calculated weighted average value per category (so mean over years) dt.agg <- dt.agg[,wy.value := sum(y_cf * pmax(0, w.value) / sum(y_cf[w.value >= 0])), by = .(field, cf_method, cat)] # scores per category # calculate total obi score (log method) dt.agg <- dt.agg[,S_T := sum(wy.value * c_cf / sum(c_cf)), by = .(field, cf_method)] # calculate total obi score (lin method) dt.agg[, c_cf_lin := ncat*log(10 + 1)/10] dt.agg <- dt.agg[,S_T_catlin := sum(wy.value * c_cf_lin / sum(c_cf_lin)), by = .(field, cf_method)] # calculate total obi score (non method) dt.agg[, c_cf_non := 1] dt.agg <- dt.agg[,S_T_catnon := sum(wy.value * c_cf_non / sum(c_cf_non)), by = .(field, cf_method)] # calculate total obi score if not aggregated by cat dt.agg <- dt.agg[, nocat.value := sum(v_cf* pmax(0,value) / sum(v_cf[value >= 0])), by = .(field, cf_method, year)] dt.agg <- dt.agg[, S_T_nocat := sum(y_cf * pmax(0, nocat.value) / sum(y_cf[nocat.value >= 0])), by = .(field, cf_method)] # select data for scores dts <- unique(dt.agg[,.(field, indicator,cat, wy.value, S_T ,treatment, cf_method, S_T_nocat, S_T_catlin, S_T_catnon)]) # reshape dts so total scores are in same column as cat scores (with T being a cat) dts1 <- unique(dts[,.(field, cat, wy.value, treatment, cf_method)]) dts2 <- unique(dts[,.(field, S_T, treatment, cf_method)]) dts3 <- unique(dts[,.(field, S_T_nocat, treatment, cf_method)]) dts4 <- unique(dts[,.(field, S_T_catlin, treatment, cf_method)]) dts5 <- unique(dts[,.(field, S_T_catnon, treatment, cf_method)]) # rename cols setnames(dts1, 'wy.value', 'score') setnames(dts2, 'S_T', 'score') setnames(dts3, 'S_T_nocat', 'score') setnames(dts4, 'S_T_catlin', 'score') setnames(dts5, 'S_T_catnon', 'score') # add cat column to dts2 dts2$cat <- 'T_cf_log' dts3$cat <- 'T_nocat' dts4$cat <- 'T_cf_lin' dts5$cat <- 'T_cf_non' # bind scores dt's dt.agg <- rbindlist(list(dts1, dts2, dts3, dts4, dts5), use.names = TRUE) # update element names dt.agg[,cat := paste0('S_',cat)] dt.agg[, score := round(score,3)] # factorise cat and cf_method dt.agg <- dt.agg[, cat := factor(cat, levels = c('S_T_cf_log', 'S_C', 'S_P', 'S_B', 'S_E','S_M', "S_T_cf_lin", "S_T_cf_non", "S_T_nocat"))] dt.agg <- dt.agg[, cf_method := factor(cf_method, levels = c('log', 'lin', 'non'))] } dt2 <- aggscores_brent(dt)
The effects of different aggregation methods on total score are shown in the box plot. Red dashed lines show arithmetic means of all 22 indicators.
# arithmetric mean of all indicators (this should be equal to lin_ccf?) arimean <- dt[cf_method == "non", arimean := mean(value), by = c("field", "treatment")] arimean <- dta[,arimean := mean(value), by = c("field", "treatment")] arimean2 <- unique(arimean[, .(field, treatment, arimean)]) arimean3 <- arimean2[, median(arimean), by = treatment] # plot ggplot(dt.out[cat == 'total'], aes(x = value, y = cf_method)) + geom_vline(data = arimean3, aes(xintercept = V1), col = "red", lty = 2) + geom_boxplot() + theme_bw() + scale_colour_viridis_d() + scale_y_discrete(limits = rev) + coord_cartesian(xlim = c(0,1)) + facet_wrap(~treatment, ncol = 1)
# get relevant data from dt dtt <- dt.out[, list(value = round(mean(value),3)), by = c( 'cat', 'cf_method', 'treatment')] # improve category discretion dtt <- dtt[,ct := paste('mean', cat)] # dcast dtt <- dcast(dtt ,treatment+cf_method~cat, value.var = 'value') # factorise and order treatment dtt <- dtt[, treatment := factor(treatment, levels = c('baseline', 'low B values', 'low C values', 'one low B', 'one low C', 'Recent years high', 'Recent years low'))] # order setorder(dtt, treatment, cf_method) # make table knitr::kable(dtt, caption = 'Mean scores per category and total per aggregation method') # make dtt version with just total scores to use for in text reporting dttt <- dtt[,.(treatment, cf_method, S_T_OBI_A = total)]
# factorise dta cat levels # dta <- dta[, cat := factor(cat, levels = c('C', 'P', 'B', 'E'))] # dta <- dta[, indicator := factor(indicator, levels = c("I_C_CEC","I_C_CU", "I_C_K", "I_C_MG", "I_C_N", "I_C_P", "I_C_PH", "I_C_S", "I_C_ZN", # "I_P_CR", "I_P_DS", "I_P_DU", "I_P_SE", "I_P_WRI", "I_P_WS","I_P_CEC","I_P_CO", # "I_B_DI", "I_B_SF", "I_E_NGW","I_E_NSW"))] # # # plot # ggplot(dta, aes(x = value, y = indicator, color = cat))+ # geom_boxplot() + # theme_bw() + coord_cartesian(xlim = c(0,1)) + scale_y_discrete(limits = rev)+ # facet_wrap(~treatment, ncol = 1)
In the baseline scenario, total scores are slightly lower when aggregating logarithmically or linearly compared to using no special aggregation method. In all three methods, the indicator values in the baseline are around r round(mean(dta[treatment == 'baseline', value]),2)
, this number is preserved in the score when averaging all indicator values (cf_method = non) while 'log' and 'lin' scores are on average r round(mean(dta[treatment == 'baseline', value]) - dtt[treatment == 'baseline'& cf_method == 'log',total],2)
and r round(mean(dta[treatment == 'baseline', value]) - dtt[treatment == 'baseline'& cf_method == 'lin',total],2)
lower. The change of 0.05 (in the score ranging between 0 and 1) may seem small, but it is quite large compared to the standard deviation of the distribution of the indicator values (which is 0.2). So, the difference in aggregation methods can influence the total score substantially.
The 'lin' and 'log' methods yield lower average scores because they penalize low indicator values (i.e. left-hand side in their distribution) with the weighing factor vcf. Since the 'lin' and 'log' methods were harmonised for their highest value at low indicator value (cf = 5 for value = 0), the penalty at intermediate indicator values are much higher for 'lin' method. This makes the average value of the OBIC score lower for 'lin' than 'log' methods. The other two aggregation steps (year aggregation (with ycf ) and category aggregation (with ccf)) causes no difference in the baseline scenario.
Note that the variation of total score is not larger for 'lin' and 'log' than 'non', irrespective of the larger variations in the correction factors for 'lin' and 'log'. To illustrate the patterns in more details, we zoom up to the baseline scenario (Fig 4). Here we split the 'log' method into 4 variations: when 'log' method is applied for all 3 aggregation steps ('log_all'), only for the indicator aggregation ('log_vcf'), year aggregation ('log_ycf'), or category aggregation ('log_ccf'). Same applies to 'lin' method. Standard deviation of total score is slightly smaller when 'log' or 'lin' method is used (SD = r round(dt3.out[cat == 'total' & treatment =="baseline" & cf_method == "log_all", sd(value)],3)
and r round(dt3.out[cat == 'total' & treatment =="baseline" & cf_method == "lin_all", sd(value)],3)
, respectively) compared to when no special aggregation method is used ('non', SD = r round(dt3.out[cat == 'total' & treatment =="baseline" & cf_method == "non", sd(value)],3)
). Looking closer at different steps of aggregation, the variation increases due to indicator aggregation (vcf) and the year aggregation (ycf), but decreased due to category aggregation (ccf). As a result, the variation becomes smaller when linear or log method is applied to all 3 aggregation steps, largely owing to the category aggregation step. Category aggregation, irrespective of the method used for correction factor, increases variation in total score because an extreme indicator value in categories with a few indicators is strongly reflected in the total score and therefore increase the variation, compared to when total scores is directly computed from all indicators. By using 'log' or 'lin' methods in category aggregation, both of which give smaller weights to categories with a few indicators, this impact is diluted, and therefore the increase in variation becomes limited.
# plot (different colors per method, 'lin' and 'log' are divided into 4 variation) dt3.out[, method_main := substr(cf_method, 1, 3)] dt3.out[, alpha := 1] dt3.out[!grepl("_all|non", cf_method), alpha := 0.5] ggplot(dt3.out[cat == 'total' & treatment =="baseline"], aes(x = value, y = cf_method, fill = method_main, alpha = alpha)) + # geom_vline(data = arimean3, aes(xintercept = V1), col = "red", lty = 2) + geom_boxplot() + theme_bw() + scale_colour_viridis_d() + scale_y_discrete(limits = rev) + labs(fill = "")+ guides(alpha = F)+ xlab("total score") + ylab("scenario") + coord_cartesian(xlim = c(0,1)) # check standard deviation #dt3.out[cat == 'total' & treatment =="baseline", sd(value), by = cf_method]
In the two scenario's where one category performs poorly; 'low B values' and 'low C values', we can see the effect of using a correction factor for the number of indicators in a category. The total score in 'low B values' of 'non' method dropped by r dtt[treatment == 'baseline'& cf_method == 'non', total]-dtt[treatment == 'low B values'& cf_method == 'non', total]
from the baseline. This decrease is larger than the other two methods: 'log' and 'lin' dropped r dtt[treatment == 'baseline'& cf_method == 'log', total]-dtt[treatment == 'low B values'& cf_method == 'log', total]
and r dtt[treatment == 'baseline'& cf_method == 'lin', total]-dtt[treatment == 'low B values'& cf_method == 'lin', total]
from the baseline scenario. In contract, in scenario 'low C values, 'log', 'lin' dropped more drastically than 'non': the decrease in total scores was r paste0(dtt[treatment == 'baseline'& cf_method == 'log', total]-dtt[treatment == 'low C values'& cf_method == 'log', total], ', ', dtt[treatment == 'baseline'& cf_method == 'lin', total]-dtt[treatment == 'low C values'& cf_method == 'lin', total], ' and ', round(dtt[treatment == 'baseline'& cf_method == 'non', total]-dtt[treatment == 'low C values'& cf_method == 'non', total],3))
points for 'log', 'lin', and 'non' methods, respectively. The contrasting patterns in the change in the total score illustrate the effect of the aggregation from the 5 category scores to a single total score, which is based on the number of indicators in each category. The 'lin' and 'log' aggregation methods are more sensitive to low scores in C (a category with 9 indicators) than low scores in B (which has 2 indicators) compared to 'non' which is equally sensitive to a low B or C value. Furthermore, 'log' gives relatively heavier weight to a category with a few indicators relative to that with many indicators than 'log': the ratio between the weight on C and B scores is r (9*log(10 + 1)/10)/(2*log(10 + 1)/10)
for 'lin' and r round(log(9+1)/log(2+1), 1)
for 'log'.
The sensitivity of the scores to the number of indicators in a category is even more pronounced when looking at scenario's 'one low C' and 'one low B', where either one chemical indicator or one biological indicator was set to 0. There is no difference between these scenario's for the 'non' aggregation method, in both scenario's a score of around r round(mean(dtt[treatment %in% c('one low C', 'one low B')& cf_method == 'non', total]),3)
is achieved, r round(abs(mean(dtt[treatment %in% c('one low C', 'one low B')& cf_method == 'non', total])-dtt[treatment == 'baseline'& cf_method == 'non', total]),3)
points lower than in the baseline. When using 'log' and 'lin' methods, scores drop r paste0(round(dtt[treatment == 'baseline'& cf_method == 'log', total]-dtt[treatment == 'one low C'& cf_method == 'log', total], 3), ' and ',dtt[treatment == 'baseline'& cf_method == 'lin', total]-dtt[treatment == 'one low C'& cf_method == 'lin', total])
in one low C and r paste0(round(dtt[treatment == 'baseline'& cf_method == 'log', total]-dtt[treatment == 'one low B'& cf_method == 'log', total], 3), ' and ',round(dtt[treatment == 'baseline'& cf_method == 'lin', total]-dtt[treatment == 'one low B'& cf_method == 'lin', total],3))
in one low B. Note that the drop in 'one low B' and 'one low C' scenarios compared to the baseline scenario is larger for 'log' than 'lin' method, showing that 'log' method is suitable for highlighting poorly-scoring indicators. In other words, 'lin' method imposes relatively severe penalty on moderately-scoring indicators, whereas it does not let poorly-scoring indicators stand out as much as 'log' method does.
The scenario's 'Recent years high' and 'Recent years low' were included to illustrate the effect of different methods to aggregate multiple year records. In 'Recent years low' scenario, scores calculated with 'lin' and 'log' are substantially lower than 'non' method, because the lower scores in recent years gained heavier weight. In 'Recent years high' scenario, 'log' and 'lin' scores only slightly higher than 'non'. Although the high recent values in 'lin' and 'log' received higher weights in the aggregation of multiple years, they are canceled out by old, low indicator values, which are severely penalized from being low.
Using correction factors, either logarithmically or linearly, can make scores more responsive to recent year, low indicator values, and categories with many underlying indicators. The magnitude of the sensitivity depends on the parameter values of the correction factors. Thus, the sensitivity presented above is merely the consequence of our current (arbitrary) choice of the parameter values, and it can be adjusted when necessary. The scores calculated with the logarithmic and linear aggregation method do not deviate largely, yet they tune the scores in slightly different ways. The difference is summarized below:
Grouping indicators in categories and aggregating these categories to a single score is a choice, its not mathematically necessary. Here we look deeper at how our choice of the category aggregation influence the behavior of the OBIC score.
In the figure below, we calculated the total OBIC score both with category aggregation ('S_T_cf_log'; with log' method) and without category aggregation ('S_T_nocat'; total score was calculated directly from all indicators). The other 2 aggregation steps were done with 'log' method for both scenarios.
When indicators of 1 category have low scores (i.e. 'low B values' and 'low C values'), the total score becomes higher when the category aggregation is executed. This is because the category aggregation assures that the impact of a category on total score does not exceed a fixed proportion: the contribution of C, P, B, E and M category to the total score is r round(100*log(9+1)/sum(log(c(9,8,2,2,1)+1)),0)
, r round(100*log(8+1)/sum(log(c(9,8,2,2,1)+1)),0)
, r round(100*log(2+1)/sum(log(c(9,8,2,2,1)+1)),0)
, r round(100*log(2+1)/sum(log(c(9,8,2,2,1)+1)),0)
, r round(100*log(1+1)/sum(log(c(9,8,2,2,1)+1)),0)
%, respectively. When no category aggregation is done, then influence of the poor indicator values can influence the total score more prominently.
When a single indicator has low scores ('one low B' and 'one low C') similar patterns were observed: the impact of the poor indicator on total score was smaller when category aggregation was one, especially in 'one low B' scenario.
Another advantage of using category aggregation is that it gives interpretable intermediate products of OBIC, in the form of 5 separate category scores. As shown in the previous section,the effects of the 3 aggregation steps on total score are large, and it is difficult to trace where and how the total score is influenced by different aggregation steps. In this light, computing category scores before aggregating to a total score is a nice way to provide disentangled insights to the users, allowing them to interpret different aspects of soils separately.
#dt <- copy(dta) #dt <- addcf(dt) #dt2 <- aggscores_brent(dt) dt2[, colv := "no category aggregation"] dt2[grepl("S_T_cf", cat), colv := "with category aggregation"] gg3 <- ggplot(dt2[cat %in% c('S_T_cf_log', #'S_T_cf_lin', 'S_T_cf_non', 'S_T_nocat') & cf_method == 'log' & !treatment %in% c('Recent years low', 'Recent years high')], aes(x = score, y = cat)) + geom_boxplot(aes(fill = colv)) + labs(fill="") + xlab("total score") + ylab("")+ theme_bw() + scale_y_discrete(limits = rev) + coord_cartesian(xlim = c(0,1)) + facet_wrap(~treatment, ncol = 1) #+ geom_boxplot(data = dt[cat == 'S_Tnocat_OBI_A'], mapping = aes(fill = 'blue')) gg3
obic_field(B_SOILTYPE_AGR = dt$B_SOILTYPE_AGR, B_GWL_CLASS = dt$B_GWL_CLASS, B_SC_WENR = dt$B_SC_WENR, B_HELP_WENR = dt$B_HELP_WENR, B_AER_CBS = dt$B_AER_CBS, B_LU_BRP = dt$B_LU_BRP, A_SOM_LOI = dt$A_SOM_LOI, A_SAND_MI = dt$A_SAND_MI, A_SILT_MI = dt$A_SILT_MI, A_CLAY_MI = dt$A_CLAY_MI, A_PH_CC = dt$A_PH_CC, A_N_RT = dt$A_N_RT, A_CN_FR = dt$A_CN_FR, A_S_RT = dt$A_S_RT, A_N_PMN = dt$A_N_PMN, A_P_AL = dt$A_P_AL, A_P_CC = dt$A_P_CC, A_P_WA = dt$A_P_WA, A_CEC_CO = dt$A_CEC_CO, A_CA_CO_PO = dt$A_CA_CO_PO, A_MG_CO_PO = dt$A_MG_CO_PO, A_K_CO_PO = dt$A_K_CO_PO, A_K_CC = dt$A_K_CC, A_MG_CC = dt$A_MG_CC, A_MN_CC = dt$A_MN_CC, A_ZN_CC = dt$A_ZN_CC, A_CU_CC = dt$A_CU_CC, output = 'obic_score')
The scenario's in the experiment above use artificial data and may not be representative of actual fields in the Netherlands. Therefore we will explore the aggregation methods described above using indicator values of fields in the binnenveld
dataset.
# cleanup bini if required if(exists('bini')){rm(bini)} # select columns bcols <- names(binnenveld)[!grepl('BCS$', names(binnenveld))] # get indicator values per field, for first 10 fields for(i in unique(binnenveld$ID)[1:10]){ # calc OBIC inidicators for i bini.n <- obic_field_dt(binnenveld[ID == i,..bcols], output = 'unaggregated') # re add ID bini.n <- bini.n[,ID := i] if(!exists('bini')){ # if bini doesn't exist yet make it bini <- bini.n } else{ # if bini exists, add bini.n to bini (binnenveld indicators) bini <- rbindlist(list(bini, bini.n)) } } # remove inidicators not used for scoring bini <- bini[!cat %in% c('BCS', 'IBCS', 'IM')] # remove irrelevant columns rmcols <- names(bini)[!grepl('^weight|cf|w$', names(bini))] bini <- bini[,..rmcols] # rename ID to field setnames(bini, 'ID', 'field') # add treatment bini$treatment <- bini$field
# add correction factors bini <- addcf(bini) # add scores bini <- aggscores(bini)
# make labels ldt <- binnenveld[,.(ID, B_LU_BRP, B_SOILTYPE_AGR, B_GWL_CLASS)] ldt <- ldt[ID %in% unique(ID)[1:10]] # get most occurring soil type and crop type ldt <- ldt[, lapply(.SD, function (x) names(sort(table(x),decreasing = TRUE)[1])), .SDcols = c('B_LU_BRP','B_SOILTYPE_AGR', 'B_GWL_CLASS'),by = ID] ldt[, B_LU_BRP := as.integer(B_LU_BRP)] # add crop name ldt <- merge(ldt, crops.obic[,.(crop_code, crop_name)], by.x = 'B_LU_BRP', by.y = 'crop_code') # order ldt setorder(ldt, ID) # make cat labels more readable bini[grepl('^T', cat), lcat := "Total"] bini[grepl('^C', cat), lcat := "Chemical"] bini[grepl('^B', cat), lcat := "Biological"] bini[grepl('^P', cat), lcat := "Physical"] bini[grepl('^M', cat), lcat := "Management"] bini[grepl('^E', cat), lcat := "Environmental"] bini[, lcat := factor(lcat, levels = c('Chemical', 'Physical', 'Biological', 'Environmental', 'Management','Total'))] # make plot gg <- ggplot(bini, aes(x= lcat, y= value, color = cf_method)) + geom_point(size = 2,alpha = 0.5) + ylab('OBI-score') + xlab('') + theme_bw(12) + theme(panel.grid.major.x = element_blank(), panel.grid.major.y = element_line(size = 0.5), panel.grid.minor.x = element_blank()) + coord_cartesian(xlim = c(0,1))+ scale_x_discrete(limits = rev)+ scale_y_continuous(breaks = c(0, 0.25, 0.5, 0.75, 1))+ coord_flip() + facet_wrap(~field, ncol = 3) + theme(legend.position = c(0.8,0.1)) # show gg gg # show table with data on binnenveld fields knitr::kable(ldt[,.(ID, B_SOILTYPE_AGR, B_GWL_CLASS, crop_name)], caption = 'Soiltype, groundwaterclass and most frequent crop per binnenveld field.')
The binnenveld
dataset lacks actual information on management parameters. In the dataset these were all set to FALSE
, resulting in low management scores for all fields. For most total and categorical scores, the highest scores are obtained with the 'non' aggregation method and the lowest with 'lin'.
When the aggregation step to categories is omitted, total scores differ little for some fields (eg. 2, 4, 5, 8), but are lower when using 'log' or 'lin' in other fields (eg. 3, 6, 7, 9)
The OBI is not the first attempt to express soil quality with a single score, see for example: @Rutgers2012 and @VanWijnen2012.
These authors described a method to numerically express the performance of ecosystem service provision of fields with what they call: the Ecosystem Performance Index (EPX). An EPX was calculated using a set of measured properties of a field and of a reference. The reference, dubbed the Maximum Ecological Potential (MEP) was derived by selecting the best performing fields in a sample for a given land-use and soil type. Depending on the ecosystem service, a selecting of soil properties was made that acted as proxy. An EPX was calculated by comparing the selected soil properties of a field with those of the reference/MEP like this:
$$
EPX = 10^{-{\left(\frac{+\left|\log\left(\frac{VAR^i_{obs}}{VAR^i_{ref}}\right)\right|+\dots-\left|\log\left(\frac{VAR^{j}{obs}}{VAR^{j}{ref}}\right)\right|}{n}\right)}}
$$
Where VAR~obs~ is a soil property of an observed field and VAR~ref~ a soil property of the MEP, n is the number of distinct soil properties used to derive the EPX. Normally a variable's contribution to the EPX is calculated using the i-type, where any deviation of the observed property from the MEP negatively affects the EPX. For some properties*ecosystem services a positive or negative deviation of the observed value from the MEP can be positive for the EPX, in such a case, the j-type has to be used. @Rutgers2012 provides as example, where more soil organic matter (SOM) is always positive for providing a certain service, if an observed SOM content is higher than the reference, the j-type is used and SOM contributes negatively to the deviation from the MEP. Consequently, if the j-type is applied, the EPX of a field can be larger then one (provided that all i-type variables are at or very close to the reference).
The indices of different services can be aggregated to a single score by taking the arithmetic mean. A weighted mean was also calculated based on the relative importance stakeholders assign to each service (here, a farmer, water manager and national government representative were used).
This method relies on statistical modeling for the selection of soil properties relevant for specific services rather then empirical relations. Furthermore, the reliance on BPJ for determining the reference makes the method vulnerable to the available experts and data. Both are drawbacks the authors recommend addressing.
In addition, we found that this method is sensitive to the numeric spread, values of a soil property are likely to have. When calculating natural attenuation capacity according to @VanWijnen2012 using observed and reference values from @Rutgers2008, we found that one of the parameters FMA, had a much larger impact on the EPX than the others. Values of FMA within the 5^th^ and 95^th^ percentile ranged from 14 to 3960, while pH for example, ranged from 7.3 to 7.7
# make table with reference data from Rutgers2008 ac.mep <- data.table(var = c('FMA', 'PotC', 'PotN', 'SOM', 'pH', 'PAL'), mep.v = c(2700, 18, 2.0, 2.2, 7.6, 47), soiltype = rep('klei', 6), landuse = rep('akkerbouw', 6), nl.mean = c(1150,22, 2, 2.5, 7.5, 47), nl5perc = c(14, 9, 0.5, 1.6, 7.3, 31), nl95perc = c(3960, 48, 3.7, 3.6, 7.7, 62)) # make data dtS <- data.table(x = seq(1.6,3.6,length.out = 100), var = 'SOM') dtF <- data.table(x=seq(10,3990,length.out = 100), var = 'FMA') dtF <- dtF[x>200] dtH <- data.table(x=seq(7.3,7.7,length.out = 100), var = 'pH') dtP <- data.table(x=seq(31,62,length.out = 100), var = 'PAL') dtC <- data.table(x=seq(9,48,length.out = 100), var = 'PotC') dtN <- data.table(x=seq(0.5,3.7,length.out = 100), var = 'PotN') # combine data dta <- rbindlist(list(dtS, dtF, dtH, dtP, dtC, dtN)) # add mep values dta <- merge(dta, ac.mep[,.(var, mep.v)], by = 'var') # indicate when to use i or j type dta[,type := 'i'] dta[(var == 'FMA' & x<mep.v) | var == 'SOM'& x>mep.v, type := 'j'] # calc y dta[,y := fifelse(type == 'i',10^-(abs(log10(x/mep.v)/6)), 10^(abs(log10(x/mep.v)/6)))] # plot ggplot(dta) + geom_line(aes(x = x, y = y, col = type))+ ylab("NAC score if all other variables are at MEP") + xlab("Variable value") + # labs(title = 'j (red) and i (black) type plot, j-type is applied') + facet_wrap(facets = ~var, scales = 'free_x', ncol = 3) + theme_bw()
# e.g. (van Wijnen et. al. 2012 and Rutgers et. al. 2012) or Moebius-Clune 2016
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.