Home

/

GitHub

/

hzambran/RFmerge

/

RFmerge: Merging of satellite datasets with ground observations using...

RFmerge: Merging of satellite datasets with ground observations using...
In hzambran/RFmerge: Merging of Satellite Datasets with Ground Observations using Random Forests

View source: R/RFmerge.R

RFmerge

R Documentation

Merging of satellite datasets with ground observations using Random Forests (RF)

Description

Merging of satellite datasets with ground observations using Random Forests (RF)

Usage

RFmerge(x, ...)

## Default S3 method:
RFmerge(x, metadata, cov, mask, training, 
        id="id", lat = "lat", lon = "lon", ED = TRUE, 
        seed = NULL, ntree = 2000, na.action = stats::na.omit,
        parallel=c("none", "parallel", "parallelWin"),
        par.nnodes=parallel::detectCores()-1, 
        par.pkgs= c("terra", "randomForest", "zoo"), write2disk=FALSE,
        drty.out, use.pb=TRUE, verbose=TRUE,...)

## S3 method for class 'zoo'
RFmerge(x, metadata, cov, mask, training, 
        id="id", lat = "lat", lon = "lon", ED = TRUE, 
        seed = NULL, ntree = 2000, na.action = stats::na.omit,
        parallel=c("none", "parallel", "parallelWin"),
        par.nnodes=parallel::detectCores()-1, 
        par.pkgs= c("terra", "randomForest", "zoo"), write2disk=FALSE,
        drty.out, use.pb=TRUE, verbose=TRUE, ...)

Arguments

`x`	data.frame with the ground-based values that will be used as the dependent variable to train the RF model. Every column must represent one ground-based station and the codes of the stations must be provided as colnames. class(data) must be zoo.
`metadata`	data.frame with the metadata of the ground-based stations. At least, it MUST have the following 3 columns: -) `id`: This column stores the unique identifier (ID) of each ground-based observation. Default value is `"id"`. -) `lat`: This column stores the latitude of each ground observation. Default value is `"lat"`. -) `lon`: This column stores the longitude of each ground observation. Default value is `"lon"`.
`cov`	List with all the covariates used as independent variables in the Random Forest model. The individual covariates can be a `RasterStack` or `RasterBrick` object when they vary in time, or they can be a single `RasterLayer` object when they do not change in time (e.g., a digital elevation model). All time-varying covariates in `cov` MUST have the same number of layers (bands). Covariates that do not change in time (e.g., a DEM) are internally transformed into `RasterStack` or `RasterBrick` objects with the same number of layers as the other time-varying elements in `cov`
`mask`	OPTIONAL. If provided, the final merged product mask out all values in `cov` outside `mask`. Spatial object (vectorial) with the spatial borders of the study area (e.g., catchment, administrative borders). `class(mask)` must be a `sf` object with "POLYGON" or "MULTIPOLYGON" geometry.
`training`	Numeric indicating the percentage of stations that will be used in the training set. The valid range is from zero to one. If `training` = 1, all the stations will be used for training purposes.
`id`	Character, with the name of the column in `metadata` where the identification code (ID) of each station is stored.
`lat`	Character, with the name of the column in `metadata` where the latitude of the stations is stored.
`lon`	Character, with the name of the column in `metadata` where the longitude of the stations is stored.
`ED`	logical, should the Euclidean distances be computed an used as covariates in the random forest model?. The default value is `TRUE`.
`seed`	Numeric, indicating a single value, interpreted as an integer, or null.
`parallel`	character, indicates how to parallelise ‘RFmerge’ (to be precise, only the evaluation of the objective function `fn` is parallelised). Valid values are: -)`none`: no parallelisation is made (this is the default value) -)`parallel`: parallel computations for network clusters or machines with multiple cores or CPUs. A ‘FORK’ cluster is created with the `makeForkCluster` function. When `fn.name="hydromod"` the evaluation of the objective function `fn` is done with the `clusterApply` function of the parallel package. When `fn.name!="hydromod"` the evaluation of the objective function `fn` is done with the `parRapply` function of the parallel package. -)`parallelWin`: parallel computations for network clusters or machines with multiple cores or CPUs (this is the only parallel implementation that works on Windows machines). A ‘PSOCK’ cluster is created with the `makeCluster` function. When `fn.name="hydromod"` the evaluation of the objective function `fn` is done with the `clusterApply` function of the parallel package. When `fn.name!="hydromod"` the evaluation of the objective function `fn` is done with the `parRapply` function of the parallel package.
`par.nnodes`	OPTIONAL. Used only when `parallel!='none'` numeric, indicates the number of cores/CPUs to be used in the local multi-core machine, or the number of nodes to be used in the network cluster. By default `par.nnodes` is set to the amount of cores detected by the function `detectCores()` (parallel package)
`par.pkgs`	OPTIONAL. Used only when `parallel='parallelWin'` list of package names (as characters) that need to be loaded on each node for allowing the objective function `fn` to be evaluated. By default `c("raster", "randomForest", "zoo")`.
`ntree`	Numeric indicating the maximum number trees to grow in the Random Forest algorithm. The default value is set to 2000. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. If this value is too low, the prediction may be biased.
`na.action`	A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.)
`write2disk`	logical, indicates if the output merged raster layers and the training and evaluation datasets (two files each, one with time series and other with metadata) will be written to the disk. By default `write2disk=FALSE`
`drty.out`	Character with the full path to the directory where the final merged product will be exported as well as the training and evaluation datasets. Only used when `write2disk=TRUE`
`use.pb`	logical, indicates if a progress bar should be used to show the progress of the random forest computations (it might reduce a bit the performance of the computations, but it is useful to track if everything is working well). By default `use.pb=TRUE`
`verbose`	logical, indicates if progress messages are to be printed. By default `verbose=TRUE`
`...`	further arguments to be passed to the low level function `randomForest.default`.

Value

It returns a RasterStack object with as many layers as time steps are present in x. Each one of the layers in the output object has the same spatial resolution and spatial extent as the cov argument.

Author(s)

Oscar M. Baez-Villanueva, obaezvil@th-koeln.de
Mauricio Zambrano-Bigiarini, mzb.devel@gmail
Juan D. Giraldo-Osorio, j.giraldoo@javeriana.edu.co

References

Baez-Villanueva, O. M.; Zambrano-Bigiarini, M.; Beck, H.; McNamara, I.; Ribbe, L.; Nauditt, A.; Birkel, C.; Verbist, K.; Giraldo-Osorio, J.D.; Thinh, N.X. (2020). RF-MEP: a novel Random Forest method for merging gridded precipitation products and ground-based measurements, Remote Sensing of Environment, 239, 111610. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.rse.2019.111606")}. <https://authors.elsevier.com/c/1aKrd7qzSnJWL>.

Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., & Gr\"aler, B. (2018). Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ, 6, e5518.

Examples

library(terra)

data(ValparaisoPPts)    
data(ValparaisoPPgis) 
   
ValparaisoSHP.fname <- system.file("extdata/ValparaisoSHP.shp",package="RFmerge")
ValparaisoSHP       <- terra::vect(ValparaisoSHP.fname)

chirps.fname   <- system.file("extdata/CHIRPS5km.tif",package="RFmerge")
prsnncdr.fname <- system.file("extdata/PERSIANNcdr5km.tif",package="RFmerge")
dem.fname      <- system.file("extdata/ValparaisoDEM5km.tif",package="RFmerge")

CHIRPS5km        <- rast(chirps.fname)
PERSIANNcdr5km   <- rast(prsnncdr.fname)
ValparaisoDEM5km <- rast(dem.fname)

covariates <- list(chirps=CHIRPS5km, persianncdr=PERSIANNcdr5km, 
                   dem=ValparaisoDEM5km)



# The following code assumes that the region is small enough for neglecting
# the impact of computing Euclidean distances in geographical coordinates.
# If this is not the case, please read the vignette 'Tutorial for merging 
# satellite-based precipitation datasets with ground observations using RFmerge' 


# without using parallelisation
rfmep <- RFmerge(x=ValparaisoPPts, metadata=ValparaisoPPgis, cov=covariates, 
                id="Code", lat="lat", lon="lon", mask=ValparaisoSHP, training=1)


# Detecting if your OS is Windows or GNU/Linux, 
# and setting the 'parallel' argument accordingly:
onWin <- ( (R.version$os=="mingw32") | (R.version$os=="mingw64") )
ifelse(onWin, parallel <- "parallelWin", parallel <- "parallel")

#Using parallelisation, with a maximum number of nodes/cores to be used equal to 2:
par.nnodes <- min(parallel::detectCores()-1, 2)
rfmep <- RFmerge(x=ValparaisoPPts, metadata=ValparaisoPPgis, cov=covariates,
                 id="Code", lat="lat", lon="lon", mask=ValparaisoSHP, 
                 training=0.8, parallel=parallel, par.nnodes=par.nnodes)

hzambran/RFmerge documentation built on Feb. 4, 2024, 1:59 a.m.

hzambran/RFmerge index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hzambran/RFmerge
Merging of Satellite Datasets with Ground Observations using Random Forests

RFmerge: Merging of satellite datasets with ground observations using...
In hzambran/RFmerge: Merging of Satellite Datasets with Ground Observations using Random Forests

Merging of satellite datasets with ground observations using Random Forests (RF)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to RFmerge in hzambran/RFmerge...

R Package Documentation

Browse R Packages

We want your feedback!

hzambran/RFmerge Merging of Satellite Datasets with Ground Observations using Random Forests

RFmerge: Merging of satellite datasets with ground observations using... In hzambran/RFmerge: Merging of Satellite Datasets with Ground Observations using Random Forests

Merging of satellite datasets with ground observations using Random Forests (RF)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to RFmerge in hzambran/RFmerge...

R Package Documentation

Browse R Packages

We want your feedback!

hzambran/RFmerge
Merging of Satellite Datasets with Ground Observations using Random Forests

RFmerge: Merging of satellite datasets with ground observations using...
In hzambran/RFmerge: Merging of Satellite Datasets with Ground Observations using Random Forests