| rf | R Documentation |
Fits a random forest model using ranger and extends it with spatial diagnostics: residual autocorrelation (Moran's I) at multiple distance thresholds, performance metrics (RMSE, NRMSE via root_mean_squared_error()), and variable importance scores computed on scaled data (via scale).
rf(
data = NULL,
dependent.variable.name = NULL,
predictor.variable.names = NULL,
distance.matrix = NULL,
distance.thresholds = NULL,
xy = NULL,
ranger.arguments = NULL,
scaled.importance = FALSE,
seed = 1,
verbose = TRUE,
n.cores = parallel::detectCores() - 1,
cluster = NULL
)
data |
Data frame with a response variable and a set of predictors. Default: |
dependent.variable.name |
Character string with the name of the response variable. Must be a column name in |
predictor.variable.names |
Character vector with predictor variable names. All names must be columns in |
distance.matrix |
Square matrix with pairwise distances between observations in |
distance.thresholds |
Numeric vector of distance thresholds for spatial autocorrelation analysis. For each threshold, distances below that value are set to zero when computing Moran's I. If |
xy |
Data frame or matrix with two columns containing coordinates, named "x" and "y". Not used by this function but stored in the model for use by |
ranger.arguments |
Named list with ranger arguments. Arguments for this function can also be passed here. The default importance method is 'permutation' instead of ranger's default 'none'. The |
scaled.importance |
If |
seed |
Random seed for reproducibility. Default: |
verbose |
If |
n.cores |
Number of cores for parallel execution. Default: |
cluster |
Cluster object from |
See ranger documentation for additional details. The formula interface is supported via ranger.arguments, but variable interactions are not permitted. For feature engineering including interactions, see the_feature_engineer().
A ranger model object with additional slots:
ranger.arguments: Arguments used to fit the model.
importance: List with global importance data frame (predictors ranked by importance), importance plot, and local importance scores (per-observation difference in accuracy between permuted and non-permuted predictors, based on out-of-bag data).
performance: Model performance metrics including R-squared (out-of-bag and standard), pseudo R-squared, RMSE, and NRMSE.
residuals: Model residuals with normality diagnostics (residuals_diagnostics()) and spatial autocorrelation (moran_multithreshold()).
Other main_models:
rf_spatial()
data(
plants_df,
plants_response,
plants_predictors,
plants_distance
)
m <- rf(
data = plants_df,
dependent.variable.name = plants_response,
predictor.variable.names = plants_predictors,
distance.matrix = plants_distance,
distance.thresholds = c(100, 1000, 2000),
ranger.arguments = list(
num.trees = 50,
min.node.size = 20
),
verbose = FALSE,
n.cores = 1
)
class(m)
#variable importance
m$importance$per.variable
m$importance$per.variable.plot
#model performance
m$performance
#autocorrelation of residuals
m$residuals$autocorrelation$per.distance
m$residuals$autocorrelation$plot
#model predictions
m$predictions$values
#predictions for new data (using stats::predict)
y <- stats::predict(
object = m,
data = plants_df[1:5, ],
type = "response"
)$predictions
#alternative: pass arguments via ranger.arguments list
args <- list(
data = plants_df,
dependent.variable.name = plants_response,
predictor.variable.names = plants_predictors,
distance.matrix = plants_distance,
distance.thresholds = c(100, 1000, 2000),
num.trees = 50,
min.node.size = 20,
num.threads = 1
)
m <- rf(
ranger.arguments = args,
verbose = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.