Getting Started with SQIpro: Comprehensive Soil Quality Index

knitr::opts_chunk$set(
  collapse = TRUE, comment = "#>",
  fig.width = 7, fig.height = 5,
  warning = FALSE, message = FALSE
)
library(SQIpro)

Introduction

Soil quality is defined as "the capacity of a specific kind of soil to function within natural or managed ecosystem boundaries, to sustain plant and animal productivity, maintain or enhance water and air quality, and support human health and habitation" [@doran1994].

The Soil Quality Index (SQI) provides a single numeric score (0–1) that integrates multiple soil physical, chemical, and biological indicators. SQIpro implements six established methods for computing SQI, a flexible variable scoring framework, and publication-quality visualisation tools — all in a single CRAN-compliant R package.

Key Concepts

| Term | Definition | |------|-----------| | Minimum Data Set (MDS) | The smallest subset of soil variables that adequately characterises soil quality [@andrews2004] | | Scoring function | A transformation that converts raw variable values to a 0–1 score | | SQI | Weighted or unweighted combination of variable scores | | Optimum | The value or range of a variable associated with best soil function |


Installation

# From CRAN (once published)
install.packages("SQIpro")

# Development version from GitHub
# install.packages("remotes")
remotes::install_github("yourname/SQIpro")

The Example Dataset

SQIpro ships with soil_data, a hypothetical dataset of 100 soil samples across 5 land-use systems and 2 depths, containing 12 soil indicators.

data(soil_data)
dim(soil_data)
head(soil_data)
table(soil_data$LandUse, soil_data$Depth)

Step 1: Validate Your Data

Always validate before analysis:

result <- validate_data(
  soil_data,
  group_cols = c("LandUse", "Depth")
)
result$n_per_group

Step 2: Define Variable Configuration

This is the most important step. Each variable receives a scoring type:

cfg <- make_config(
  variable    = c("pH",   "EC",   "BD",   "OC",   "MBC",
                  "PMN",  "Clay", "WHC",  "DEH",  "AP",  "TN"),
  type        = c("opt",  "less", "less", "more", "more",
                  "more", "opt",  "more", "more", "more","more"),
  opt_low     = c(6.0,   NA,     NA,     NA,     NA,
                  NA,     20,     NA,     NA,     NA,    NA),
  opt_high    = c(7.0,   NA,     NA,     NA,     NA,
                  NA,     35,     NA,     NA,     NA,    NA),
  description = c(
    "Soil pH (H2O 1:2.5)",
    "Electrical Conductivity (dS/m)",
    "Bulk Density (g/cm3)",
    "Organic Carbon (%)",
    "Microbial Biomass Carbon (mg/kg)",
    "Potentially Mineralizable N (mg/kg)",
    "Clay content (%)",
    "Water Holding Capacity (%)",
    "Dehydrogenase Activity (ug TPF/g/day)",
    "Available Phosphorus (mg/kg)",
    "Total Nitrogen (%)"
  )
)
print(cfg)

Verify scoring curves

Before proceeding, always visualise your scoring curves to confirm biological plausibility:

plot_scoring_curves(soil_data, cfg,
                    group_cols = c("LandUse", "Depth"))

Step 3: Score All Variables

scored <- score_all(soil_data, cfg,
                    group_cols = c("LandUse", "Depth"))
head(scored[, c("LandUse", "Depth", "pH", "OC", "MBC")])

Step 4: Select the Minimum Data Set (MDS)

mds <- select_mds(scored,
                  group_cols     = c("LandUse", "Depth"),
                  load_threshold = 0.6)
mds$mds_vars

Step 5: Compute SQI Using All Six Methods

Linear Scoring

Equal-weight additive index [@doran1994]:

res_lin <- sqi_linear(scored, cfg,
                      group_cols = c("LandUse", "Depth"),
                      mds_vars   = mds$mds_vars)
print(res_lin)

Regression-Based

Variables weighted by stepwise regression coefficients [@masto2008]:

res_reg <- sqi_regression(scored, cfg,
                           dep_var    = "OC",
                           group_cols = c("LandUse", "Depth"),
                           mds_vars   = mds$mds_vars)
print(res_reg)

PCA-Based

Variables weighted by variance explained [@andrews2004; @bastida2008]:

res_pca <- sqi_pca(scored, cfg,
                   group_cols = c("LandUse", "Depth"),
                   mds        = mds)
print(res_pca)

Fuzzy Logic

Arithmetic or geometric mean of fuzzy membership scores:

res_fuz <- sqi_fuzzy(scored, cfg,
                     group_cols = c("LandUse", "Depth"),
                     mds_vars   = mds$mds_vars,
                     operator   = "mean")
print(res_fuz)

Entropy Weighting

Objective weights derived from Shannon entropy [@shannon1948]:

res_ent <- sqi_entropy(scored, cfg,
                       group_cols = c("LandUse", "Depth"),
                       mds_vars   = mds$mds_vars)
print(res_ent)
cat("\nEntropy weights:\n")
print(attr(res_ent, "entropy_weights"))

TOPSIS

Multi-criteria ranking by ideal solution distance [@hwang1981]:

res_top <- sqi_topsis(scored, cfg,
                      group_cols = c("LandUse", "Depth"),
                      mds_vars   = mds$mds_vars)
print(res_top)

Compare All Methods

comparison <- sqi_compare(scored, cfg,
                           group_cols = c("LandUse", "Depth"),
                           dep_var    = "OC",
                           mds        = mds)
print(comparison)

Step 6: Visualisation

Score Heatmap

plot_scores(scored, cfg,
            group_cols = c("LandUse", "Depth"),
            group_by   = "LandUse",
            facet_by   = "Depth")

SQI Bar Chart

plot_sqi(res_lin,
         sqi_col   = "SQI_linear",
         group_col = "LandUse",
         fill_col  = "Depth")

Radar Profile

# Requires the 'fmsb' package: install.packages("fmsb")
plot_radar(scored, cfg,
           group_col  = "LandUse",
           group_cols = c("LandUse", "Depth"))

PCA Biplot

plot_pca_biplot(mds, scored, group_col = "LandUse")

Step 7: Statistical Analysis

ANOVA

# Compute per-observation index
scored$SQI_obs <- rowMeans(scored[, mds$mds_vars], na.rm = TRUE)

aov_res <- sqi_anova(scored,
                     sqi_col   = "SQI_obs",
                     group_col = "LandUse")
cat("ANOVA significant:", aov_res$significant, "\n")
print(aov_res$compact_letters)

Sensitivity Analysis

sa <- sqi_sensitivity(scored, cfg,
                       group_cols = c("LandUse", "Depth"),
                       method     = "linear",
                       mds_vars   = mds$mds_vars)
print(sa)

Sensitivity Tornado Chart

plot_sensitivity(sa)

Recommended Workflow Diagram

raw data
   │
   ▼
validate_data()          ← check structure, missing values, groups
   │
   ▼
make_config()            ← define scoring type per variable
   │
   ▼
plot_scoring_curves()    ← verify biological plausibility
   │
   ▼
score_all()              ← transform all variables to 0–1
   │
   ▼
select_mds()             ← PCA-based minimum data set selection
   │
   ├──► sqi_linear()
   ├──► sqi_regression()
   ├──► sqi_pca()
   ├──► sqi_fuzzy()
   ├──► sqi_entropy()
   └──► sqi_topsis()
             │
             ▼
        sqi_compare()    ← unified results table + ranking
             │
             ▼
      Visualisation      ← plot_sqi(), plot_scores(), plot_radar()
             │
             ▼
      Statistics         ← sqi_anova(), sqi_sensitivity()

References




Try the SQIpro package in your browser

Any scripts or data that you put into this service are public.

SQIpro documentation built on April 20, 2026, 5:06 p.m.