RFE_SCE | R Documentation |
This function implements Recursive Feature Elimination (RFE) to identify the most important predictors for SCE models. It iteratively removes the least important predictors based on Wilks' feature importance scores and evaluates model performance. The function supports both single and multiple predictants, with comprehensive input validation and performance tracking across iterations.
The package also provides a Plot_RFE
function for visualizing RFE results, showing validation and testing R2 values as a function of the number of predictors.
RFE_SCE(
Training_data,
Testing_data,
Predictors,
Predictant,
Nmin,
Ntree,
alpha = 0.05,
resolution = 1000,
step = 1,
verbose = TRUE,
parallel = TRUE
)
Plot_RFE(
rfe_result,
main = "Validation and Testing R2 vs Number of Predictors",
col_validation = "blue",
col_testing = "red",
pch = 16,
lwd = 2,
cex = 1.2,
legend_pos = "bottomleft",
...
)
Training_data |
A data.frame containing the training data. Must include all specified predictors and predictants. |
Testing_data |
A data.frame containing the testing data. Must include all specified predictors and predictants. |
Predictors |
A character vector specifying the names of independent variables to be evaluated (e.g., c("Prcp","SRad","Tmax")). Must contain at least 2 elements. |
Predictant |
A character vector specifying the name(s) of dependent variable(s) (e.g., c("swvl3","swvl4")). Must be non-empty. |
Nmin |
Integer specifying the minimal number of samples in a leaf node for cutting. |
Ntree |
Integer specifying the number of trees in the ensemble. |
alpha |
Numeric significance level for clustering, between 0 and 1. Default value is 0.05. |
resolution |
Numeric value specifying the resolution for splitting. Default value is 1000. |
step |
Integer specifying the number of predictors to remove at each iteration. Must be between 1 and (number of predictors - number of predictants). Default value is 1. |
verbose |
A logical value indicating whether to print progress information during RFE iterations. Default value is TRUE. |
parallel |
A logical value indicating whether to use parallel processing for SCE model construction. When TRUE, uses multiple CPU cores for faster computation. When FALSE, processes trees sequentially. Default value is TRUE. |
Plot_RFE Arguments:
rfe_result |
The result object from RFE_SCE function containing summary and performances components. |
main |
Title for the plot. Default is "Validation and Testing R2 vs Number of Predictors". |
col_validation |
Color for validation line. Default is "blue". |
col_testing |
Color for testing line. Default is "red". |
pch |
Point character for markers. Default is 16 (filled circle). |
lwd |
Line width. Default is 2. |
cex |
Point size. Default is 1.2. |
legend_pos |
Position of legend. Default is "bottomleft". |
... |
Additional arguments passed to plot function. |
RFE_SCE Process: The RFE process involves the following steps:
Input validation:
Data frame structure validation
Predictor and predictant validation
Step size validation
Initialization:
Set up history tracking structures
Initialize current predictor set
Main RFE loop (continues while predictors > predictants + 2):
Train SCE model with current predictors
Generate predictions using Model_simulation
Evaluate model using SCE_Model_evaluation
Store performance metrics and importance scores
Remove least important predictors based on Wilks' scores
The function handles:
Single and multiple predictants
Performance tracking across iterations
Importance score calculation
Step-wise predictor removal
Plot_RFE Function: Creates a base R plot showing validation and testing R2 values as a function of the number of predictors during the RFE process. The function:
Extracts R2 values from RFE results
Converts formatted strings to numeric values
Creates a line plot with points and lines
Includes a legend distinguishing validation and testing performance
Supports customization of colors, line styles, and plot appearance
Uses only base R graphics (no external dependencies)
RFE_SCE: A list containing:
summary: Data.frame with columns:
n_predictors: Number of predictors at each iteration
predictors: Comma-separated list of predictors used
performances: List of performance evaluations for each iteration
For single predictant: Direct performance data.frame
For multiple predictants: Named list of performance data.frames
importance_scores: List of Wilks' importance scores for each iteration
Plot_RFE: Invisibly returns a list containing:
n_predictors: Vector of predictor counts
validation_r2: Vector of validation R2 values
testing_r2: Vector of testing R2 values
Kailong Li <lkl98509509@gmail.com>
See the generic functions importance
and evaluate
for SCE objects.
For visualization of RFE results, see Plot_RFE
.
# # This example is computationally intensive and may take a long time to run.
# # It is recommended to run this example on a machine with a high-performance CPU.
#
# ## Load SCE package and the supporting packages
# library(SCE)
# library(parallel)
#
# data(Streamflow_training_22var)
# data(Streamflow_testing_22var)
#
# # Define predictors and predictants
# Predictors <- c(
# "Precipitation", "Radiation", "Tmax", "Tmin", "VP",
# "Precipitation_2Mon", "Radiation_2Mon", "Tmax_2Mon", "Tmin_2Mon", "VP_2Mon",
# "PNA", "Nino3.4", "IPO", "PDO",
# "PNA_lag1", "Nino3.4_lag1", "IPO_lag1", "PDO_lag1",
# "PNA_lag2", "Nino3.4_lag2", "IPO_lag2", "PDO_lag2"
# )
# Predictants <- c("Flow")
#
# # Perform RFE
# set.seed(123)
# result <- RFE_SCE(
# Training_data = Streamflow_training_22var,
# Testing_data = Streamflow_testing_22var,
# Predictors = Predictors,
# Predictant = Predictants,
# Nmin = 5,
# Ntree = 48,
# alpha = 0.05,
# resolution = 1000,
# step = 3, # Number of predictors to remove at each iteration
# verbose = TRUE,
# parallel = TRUE
# )
#
# ## Access results
# summary <- result$summary
# performances <- result$performances
# importance_scores <- result$importance_scores
#
# ## Plot RFE results
# Plot_RFE(result)
#
# ## Customized plot
# Plot_RFE(result,
# main = "My RFE Results",
# col_validation = "darkblue",
# col_testing = "darkred",
# lwd = 3,
# cex = 1.5)
#
# ## Note: The RFE_SCE function internally uses S3 methods for SCE models
# ## including importance() and evaluate() for model analysis
#
#
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.