pred.rfsi: Random Forest Spatial Interpolation (RFSI) prediction

pred.rfsiR Documentation

Random Forest Spatial Interpolation (RFSI) prediction

Description

Function for spatial/spatio-temporal prediction based on Random Forest Spatial Interpolation (RFSI) model (Sekulić et al. 2020).

Usage

pred.rfsi(model,
          data,
          obs.col=1,
          data.staid.x.y.z = NULL,
          newdata,
          newdata.staid.x.y.z = NULL,
          z.value = NULL,
          s.crs = NA,
          newdata.s.crs=NA,
          p.crs = NA,
          output.format = "data.frame",
          cpus = detectCores()-1,
          progress = TRUE,
          soil3d = FALSE, # soil RFSI
          depth.range = 0.1, # in units of depth
          no.obs = 'increase',
          ...)

Arguments

model

ranger; An RFSI model made by rfsi function.

data

sf-class, sftime-class, SpatVector-class or data.frame; Contains target variable (observations) and covariates used for RFSI prediction. If data.frame object, it should have next columns: station ID (staid), longitude (x), latitude (y), 3rd component - time, depth, ... (z) of the observation, and observation value (obs).

obs.col

numeric or character; Column name or number showing position of the observation column in the data. Default is 1.

data.staid.x.y.z

numeric or character vector; Positions or names of the station ID (staid), longitude (x), latitude (y) and 3rd component (z) columns in data.frame object (e.g. c(1,2,3,4)). If data is sf-class, sftime-class, or SpatVector-class object, data.staid.x.y.z is used to point staid and z position. Set z position to NA (e.g. c(1,2,3,NA)) or ommit it (e.g. c(1,2,3)) for spatial interpolation. Default is NULL.

newdata

sf-class, sftime-class, SpatVector-class, SpatRaster-class or data.frame; Contains prediction locations and covariates used for RFSI prediction. If data.frame object, it should have next columns: prediction location ID (staid), longitude (x), latitude (y), 3rd component - time, depth, ... (z), and covariates (cov1, cov2, ...). Covariate names have to be the same as in the model.

newdata.staid.x.y.z

numeric or character vector; Positions or names of the prediction location ID (staid), longitude (x), latitude (y) and 3rd component (z) columns in data.frame newdata object (e.g. c(1,2,3,4)). If newdata is sf-class, sftime-class, SpatVector-class or SpatRaster-class object, newdata.staid.x.y.z is used to point staid and z position. Set z position to NA (e.g. c(1,2,3,NA)) or ommit it (e.g. c(1,2,3)) for spatial interpolation. Default is NULL.

z.value

vector; A vector of 3rd component - time, depth, ... (z) values if newdata is SpatRaster-class.

s.crs

st_crs or crs; Source CRS of data. If data contains crs, s.crs will not be used. Default is NA.

newdata.s.crs

st_crs or crs; Source CRS of newdata. If newdata contains crs, newdata.s.crs will not be used. Default is NA.

p.crs

st_crs or crs; Projection CRS for data reprojection. If NA, s.crs will be used for distance calculation. Note that observations should be in projection for finding nearest observations based on Eucleadean distances (see function near.obs). Default is NA.

output.format

character; Format of the output, data.frame (default), sf-class, sftime-class, SpatVector-class, or SpatRaster-class.

cpus

numeric; Number of processing units. Default is detectCores()-1.

progress

logical; If progress bar is shown. Default is TRUE.

soil3d

logical; If 3D soil modellig is performed and near.obs.soil function is used for finding n nearest observations and distances to them. In this case, z position of the data.staid.x.y.z points to the depth column.

depth.range

numeric; Depth range for location mid depth in which to search for nearest observations (see function near.obs.soil). It's in the mid depth units. Default is 0.1.

no.obs

character; Possible values are increase (default) and exactly. If set to increase, in case if there is no n.obs observations in depth.range for a specific location, the depth.range is increased (multiplied by 2, 3, ...) until the number of observations are larger or equal to n.obs. If set to exactly, the function will raise an error when it come to the first location with no n.obs observations in specified depth.range (see function near.obs.soil).

...

Further arguments passed to predict.ranger function, such as type = "quantile" and quantiles = c(0.1,0.5,0.9) for quantile regression, etc.

Value

A data.frame, sf-class, sftime-class, SpatVector-class, or SpatRaster-class object (depends on output.format argument) with prediction - pred or quantile..X.X (quantile regression) columns.

Author(s)

Aleksandar Sekulic asekulic@grf.bg.ac.rs

References

Sekulić, A., Kilibarda, M., Heuvelink, G. B., Nikolić, M. & Bajat, B. Random Forest Spatial Interpolation. Remote. Sens. 12, 1687, https://doi.org/10.3390/rs12101687 (2020).

See Also

near.obs rfsi tune.rfsi cv.rfsi

Examples

library(ranger)
library(sp)
library(sf)
library(terra)
library(meteo)

# prepare data
demo(meuse, echo=FALSE)
meuse <- meuse[complete.cases(meuse@data),]
data = st_as_sf(meuse, coords = c("x", "y"), crs = 28992, agr = "constant")
# data = terra::vect(meuse)
# data.frame
# data <- as.data.frame(meuse)
# data$id = 1:nrow(data)
# data.staid.x.y.z <- c("id","x","y",NA)
fm.RFSI <- as.formula("zinc ~ dist + soil + ffreq")

# fit the RFSI model
rfsi_model <- rfsi(formula = fm.RFSI,
                   data = data, # meuse.df (use data.staid.x.y.z)
                   # data.staid.x.y.z = data.staid.x.y.z, # only if class(data) == data.frame
                   n.obs = 5, # number of nearest observations
                   # s.crs = st_crs(data), # nedded only if the coordinates are lon/lat (WGS84)
                   # p.crs = st_crs(data), # nedded only if the coordinates are lon/lat (WGS84)
                   cpus = detectCores()-1,
                   progress = TRUE,
                   # ranger parameters
                   importance = "impurity",
                   seed = 42,
                   num.trees = 250,
                   mtry = 5,
                   splitrule = "variance",
                   min.node.size = 5,
                   sample.fraction = 0.95,
                   quantreg = FALSE)
                   # quantreg = TRUE) # for quantile regression

rfsi_model
# OOB prediction error (MSE):       47758.14 
# R squared (OOB):                  0.6435869 
sort(rfsi_model$variable.importance)
sum("obs" == substr(rfsi_model$forest$independent.variable.names, 1, 3))

# Make RFSI prediction
# data.frame
# newdata <- as.data.frame(meuse.grid)
# newdata$id <- 1:nrow(newdata)
# newdata <- meuse.grid
newdata <- terra::rast(meuse.grid)
class(newdata)

# prediction
rfsi_prediction <- pred.rfsi(model = rfsi_model,
                             data = data, # meuse.df (use data.staid.x.y.z)
                             obs.col = "zinc",
                             # data.staid.x.y.z = data.staid.x.y.z, # data.frame
                             newdata = newdata, # meuse.grid.df (use newdata.staid.x.y.z)
                             # newdata.staid.x.y.z = c("id", "x", "y", NA), # data.frame
                             output.format = "SpatRaster", # "sf", # "SpatVector", 
                             zero.tol = 0,
                             # s.crs = st_crs(data), # meuse@proj4string, # NA # st_crs(data)
                             # newdata.s.crs = st_crs(data), # meuse@proj4string, # NA
                             # p.crs = st_crs(data), # meuse@proj4string, # NA
                             cpus = 1, # detectCores()-1,
                             progress = TRUE,
                             # type = "quantiles", # for quantile regression
                             # quantiles = c(0.1, 0.5, 0.9) # for quantile regression
)
class(rfsi_prediction)
names(rfsi_prediction)
# head(rfsi_prediction)

# plot(rfsi_prediction)
# plot(rfsi_prediction['pred'])
# plot(rfsi_prediction['quantile..0.1'])
# plot(rfsi_prediction['quantile..0.5'])
# plot(rfsi_prediction['quantile..0.9'])

meteo documentation built on Nov. 23, 2023, 3:01 p.m.