GWRFC: Geographically weighted Random Forest Classification

View source: R/GWRFC.R

GWRFCR Documentation

Geographically weighted Random Forest Classification

Description

GWRFC is a software for analyze and explore spatial data. It constructs geographically weighted models (GW; Fotheringham et al. 1998) to train random forest (RF; Breiman 2001) and report local models with partial depende plots (PDP, Greenwell, 2019). Prediction results and accurancy metrics (ACC) are also representated accondingly.

Usage

GWRFC(
  input_shapefile,
  remove_columns = NA,
  dependent_varName,
  kernel_function = "exponential",
  kernel_adaptative = T,
  kernel_bandwidth,
  upsampling = T,
  save_models = F,
  enable_pdp = F,
  number_cores = 1,
  output_folder
)

Arguments

input_shapefile

string or Spatial-class. Input shapefile with dependent and independent variables. It can be the filename of the shapefile or an object of class SpatialPolygonsDataFrame or SpatialPointsDataFrame.

remove_columns

string. Remove specific variables from input_shapefile. Variables are identified by column name. NA ignores column remove.

dependent_varName

string. Dependent variable name. Must exists at input_shapefile and should be categorical (with not more than 20 classes).

kernel_function

string. Kernel type to apply in GWRFC. It can be: 'gaussian', 'exponential', 'bisquare' or 'tricube'.

kernel_adaptative

logical. Is the kernel adaptative? otherwise it is considered as fixed (larger processing time).

kernel_bandwidth

numeric. Defines kernel bandwidth. If kernel_adaptative is TRUE, then you should define the number of local observations in the kernel, otherwise you should define a distance to specify kernel bandwidth.

upsampling

logical. If TRUE, upsampling is applied before random forest training, otherwise it is downsampled. Consider that upsampling is a bit more computing demanding but accuracy is improved.

save_models

logical. If TRUE, random forest models are stored at output_folder as a RDS file. Beware it can be large, therefore storage requires hard drive memory and can slow down algorithm exit.

enable_pdp

logical. –EXPERIMENTAL– If TRUE, partial dependence plots YHAT maximun, together with its correspondent independent variable value (PDP) are calculated.

number_cores

numeric. Number of cores for parallel processing. Cores are register and operated via doParallel, foreach and parallel packages. Be careful with increasing numbers of cores, as RAM memory may be not enough.

output_folder

string. Output folder where GWRFC outputs will be stored.

Value

As a result, four shapefiles are created whose prefixes refer to:

  1. LVI: Local variables importance. Calculated via permutation for each variable.

  2. PDP: Independent variables local maxima (class or value). Identified when YHAT reach its maximum during RF model marginalization. Calculated for each variable.

  3. YHAT: Prediction result for dependent_varName when PDP local maxima is applied. Calculated for each variable.

  4. ACC: Prediction and accuracies: predicted class, kappa from Out-of-Bag, classes probabilities and prediction failures.

In all shapefiles cases, a column called 'ID_row' refers to rownames of input_shapefile. In addition, processing evolution can be monitored at output_folder as: data_progress.txt

Examples


#view deforestation data

data("deforestation")
tmap_mode("view")
tm_basemap("OpenStreetMap") +
 tm_shape(deforestation) +
 tm_polygons(col="fao",style="cat",title="Annual deforestation rate  2000-2010 (FAO) - categorical (quantiles)",palette="YlOrRd")

#run GWRFC

GWRFC(input_shapefile = deforestation, #can be a spatial dataframe (points or polygons) or the complete filename of the shapefile to analyze.
     remove_columns = c("ID_grid","L_oth"), #for remove variables if they are not informative. Put NA to avoid removal.
     dependent_varName = "fao", #the depedent variable to evaluate. It should be of factor or character data type.
     kernel_function = "exponential", #the weightening function. See help for other available functions.
     kernel_adaptative = T, #use TRUE for adaptative kernel distance or FALSE for a fixed kernel distance.
     kernel_bandwidth = 400, #as the kernel is adaptative, 400 refers to the minimun number of observations to use in modelling.
     upsampling = T, #improves accuracy (recommended) but is a bit more computing costly.
     save_models = T, #save RF models. Beware of hard disk space and extra processing time.
     enable_pdp = F, #experimental, use with caution as is sensible to noise.
     number_cores = 3, #defines the number of CPU cores to use
     output_folder = "E:/demo/deforestation") #check this folder for GWRFC outputs.


FSantosCodes/GWRFC documentation built on Sept. 24, 2023, 6:07 a.m.