| select_mds | R Documentation |
Identifies the most informative subset of soil variables (the Minimum Data Set, MDS) using Principal Component Analysis (PCA). Only variables with high factor loadings on principal components explaining eigenvalue > 1 (Kaiser criterion) are retained. Where multiple variables load highly on the same component, the one with the highest correlation to others in that component is selected to minimise redundancy.
This approach follows the widely cited method of Andrews et al. (2004)
and Sharma et al. (2008), and is equivalent to the PCAIndex
algorithm in Wani et al. (2023).
select_mds(
data,
group_cols = "LandUse",
load_threshold = 0.5,
vif_threshold = 10,
n_pc = "auto",
verbose = TRUE
)
data |
A data frame of scored or raw soil variables (numeric
columns only, or with group columns specified in |
group_cols |
Character vector of grouping columns to exclude from
the analysis. Default: |
load_threshold |
Numeric in (0, 1). Minimum absolute factor
loading for a variable to be considered for MDS membership.
Default: |
vif_threshold |
Numeric. Maximum allowable Variance Inflation
Factor among MDS variables. Variables exceeding this are iteratively
removed. Set to |
n_pc |
Integer or |
verbose |
Logical. Print MDS selection summary. Default |
**Algorithm steps:**
Standardise all numeric variables (mean = 0, sd = 1).
Perform PCA; retain components with eigenvalue > 1.
For each retained component, identify variables with absolute
loading \ge load_threshold.
Among those, select the variable with the highest sum of absolute Pearson correlations to all others in the set (i.e., the most correlated, least redundant variable).
Optionally, remove variables with high Variance Inflation Factor
(VIF > vif_threshold) among the MDS candidates.
A list of class sqi_mds with:
Character vector of selected MDS variable names.
Character vector of all candidate variable names.
The PCA result object.
Matrix of factor loadings.
Numeric vector of eigenvalues.
Numeric vector of variance explained (%) per component.
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework: A quantitative soil quality evaluation method. Soil Science Society of America Journal, 68(6), 1945–1962. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2136/sssaj2004.1945")}
Kaiser, H.F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20(1), 141–151. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/001316446002000116")}
Sharma, K.L., et al. (2008). Long-term soil management effects on soil quality indices. Geoderma, 144, 290–300. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.geoderma.2007.11.019")}
data(soil_data)
cfg <- make_config(
variable = c("pH","EC","BD","OC","MBC","PMN","Clay","WHC","DEH","AP","TN"),
type = c("opt","less","less","more","more","more",
"opt","more","more","more","more"),
opt_low = c(6.0, NA, NA, NA, NA, NA, 20, NA, NA, NA, NA),
opt_high = c(7.0, NA, NA, NA, NA, NA, 35, NA, NA, NA, NA)
)
scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth"))
mds <- select_mds(scored, group_cols = c("LandUse","Depth"))
mds$mds_vars
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.