Description Usage Arguments Value Note Author(s) References Examples
Automated spatial predictions and/or interpolation using Ensemble Machine Learning. Extends functionality of the mlr package. Suitable for predicting numeric, binomial and factor-type variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ## S4 method for signature 'SpatialPointsDataFrame,ANY,SpatialPixelsDataFrame'
train.spLearner(
observations,
formulaString,
covariates,
SL.library,
family = stats::gaussian(),
method = "stack.cv",
predict.type,
super.learner = "regr.lm",
subsets = 5,
lambda = 0.5,
cov.model = "exponential",
subsample = 10000,
parallel = "multicore",
oblique.coords = TRUE,
nearest = FALSE,
buffer.dist = FALSE,
theta.list = seq(0, 180, length.out = 14) * pi/180,
spc = TRUE,
id = NULL,
weights = NULL,
n.obs = 10,
...
)
|
observations |
SpatialPointsDataFrame. |
formulaString |
ANY. |
covariates |
SpatialPixelsDataFrame. |
SL.library |
List of learners, |
family |
Family e.g. gaussian(), |
method |
Ensemble stacking method (see makeStackedLearner) usually |
predict.type |
Prediction type 'prob' or 'response', |
super.learner |
Ensemble stacking model usually |
subsets |
Number of subsets for repeated CV, |
lambda |
Target variable transformation (0.5 or 1), |
cov.model |
Covariance model for variogram fitting, |
subsample |
For large datasets consider random subsetting training data, |
parallel |
logical, Initiate parallel processing, |
oblique.coords |
Specify whether to use oblique coordinates as covariates, |
nearest |
Specify whether to use nearest values and distances i.e. the method of Sekulic et al. (2020), |
buffer.dist |
Specify whether to use buffer distances to points as covariates, |
theta.list |
List of angles (in radians) used to derive oblique coordinates, |
spc |
specifies whether to apply principal components transformation. |
id |
Id column name to control clusters of data, |
weights |
Optional weights (per row) that learners will use to account for variable data quality, |
n.obs |
Number of nearest observations to be found in |
... |
other arguments that can be passed on to |
object of class spLearner
, which contains fitted model, variogram model and spatial grid
used for Cross-validation.
By default uses oblique coordinates (rotated coordinates) as described in Moller et al. (2020; doi: 10.5194/soil-6-269-2020) to account for geographical distribution of values.
By setting the nearest = TRUE
, distances to nearest observations and values of nearest neighbors will be used (see: Sekulic et al, 2020; doi: 10.3390/rs12101687). This method closely resembles geostatistical interpolators such as kriging.
Buffer geographical distances can be added by setting buffer.dist=TRUE
.
Using either oblique coordinates and/or buffer distances is not recommended for point data set with distinct spatial clustering.
Effects of adding geographical distances into modeling are explained in detail in Hengl et al. (2018; doi: 10.7717/peerj.5518) and Sekulic et al. (2020; doi: 10.3390/rs12101687).
Default learners used for regression are: c("regr.ranger", "regr.ksvm", "regr.nnet", "regr.cvglmnet")
.
Default learners used for classification / binomial variables are: c("classif.ranger", "classif.svm", "classif.multinom")
, with predict.type="prob"
.
When using method = "stack.cv"
each training and prediction round could produce somewhat different results due to randomization of CV.
Prediction errors are derived by default using the forestError
package method described in Lu & Hardin (2021).
Optionally, the quantreg (Quantile Regression) option from the ranger package (Meinshausen, 2006) can also be used.
Moller, A. B., Beucher, A. M., Pouladi, N., and Greve, M. H. (2020). Oblique geographic coordinates as covariates for digital soil mapping. SOIL, 6, 269–289. doi: 10.5194/soil-6-269-2020
Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., and Graler, B. (2018) Random Forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 6:e5518. doi: 10.7717/peerj.5518
Lu, B., & Hardin, J. (2021). A unified framework for random forest prediction error estimation. Journal of Machine Learning Research, 22(8), 1–41. https://jmlr.org/papers/v22/18-558.html
Meinshausen, N. (2006). Quantile regression forests. Journal of Machine Learning Research, 7(Jun), 983–999. https://jmlr.org/papers/v7/meinshausen06a.html
Sekulic, A., Kilibarda, M., Heuvelink, G. B., Nikolic, M. & Bajat, B. (2020). Random Forest Spatial Interpolation. Remote. Sens. 12, 1687, doi: 10.3390/rs12101687
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | library(rgdal)
library(mlr)
library(rpart)
library(nnet)
demo(meuse, echo=FALSE)
## Regression:
sl = c("regr.rpart", "regr.nnet", "regr.glm")
system.time( m <- train.spLearner(meuse["lead"],
covariates=meuse.grid[,c("dist","ffreq")],
oblique.coords = FALSE, lambda=0,
parallel=FALSE, SL.library=sl) )
summary(m@spModel$learner.model$super.model$learner.model)
## Not run:
library(plotKML)
## regression-matrix:
str(m@vgmModel$observations@data)
meuse.y <- predict(m, error.type="weighted.sd")
plot(raster::raster(meuse.y$pred["response"]), col=plotKML::R_pal[["rainbow_75"]][4:20],
main="Predictions spLearner", axes=FALSE, box=FALSE)
## Regression with default settings:
m <- train.spLearner(meuse["zinc"], covariates=meuse.grid[,c("dist","ffreq")],
parallel=FALSE, lambda = 0)
## Ensemble model (meta-learner):
summary(m@spModel$learner.model$super.model$learner.model)
meuse.y <- predict(m)
## Plot of predictions and prediction error (RMSPE)
op <- par(mfrow=c(1,2), oma=c(0,0,0,1), mar=c(0,0,4,3))
plot(raster::raster(meuse.y$pred["response"]), col=plotKML::R_pal[["rainbow_75"]][4:20],
main="Predictions spLearner", axes=FALSE, box=FALSE)
points(meuse, pch="+")
plot(raster::raster(meuse.y$pred["model.error"]), col=rev(bpy.colors()),
main="Prediction errors", axes=FALSE, box=FALSE)
points(meuse, pch="+")
par(op)
while (!is.null(dev.list())) dev.off()
## Plot of prediction intervals:
pts = list("sp.points", meuse, pch = "+", col="black")
spplot(meuse.y$pred[,c("q.lwr","q.upr")], col.regions=plotKML::R_pal[["rainbow_75"]][4:20],
sp.layout = list(pts),
main="Prediction intervals (alpha = 0.318)")
while (!is.null(dev.list())) dev.off()
## Method from https://doi.org/10.3390/rs12101687
#library(meteo)
mN <- train.spLearner(meuse["zinc"], covariates=meuse.grid[,c("dist","ffreq")],
parallel=FALSE, lambda=0, nearest=TRUE)
meuse.N <- predict(mN)
## Plot of predictions and prediction error (RMSPE)
op <- par(mfrow=c(1,2), oma=c(0,0,0,1), mar=c(0,0,4,3))
plot(raster::raster(meuse.N$pred["response"]), col=plotKML::R_pal[["rainbow_75"]][4:20],
main="Predictions spLearner meteo::near.obs", axes=FALSE, box=FALSE)
points(meuse, pch="+")
plot(raster::raster(meuse.N$pred["model.error"]), col=rev(bpy.colors()),
main="Prediction errors", axes=FALSE, box=FALSE)
points(meuse, pch="+")
par(op)
while (!is.null(dev.list())) dev.off()
## Classification:
SL.library <- c("classif.ranger", "classif.xgboost", "classif.nnTrain")
mC <- train.spLearner(meuse["soil"], covariates=meuse.grid[,c("dist","ffreq")],
SL.library = SL.library, super.learner = "classif.glmnet", parallel=FALSE)
meuse.soil <- predict(mC)
spplot(meuse.soil$pred[grep("prob.", names(meuse.soil$pred))],
col.regions=plotKML::SAGA_pal[["SG_COLORS_YELLOW_RED"]], zlim=c(0,1))
spplot(meuse.soil$pred[grep("error.", names(meuse.soil$pred))],
col.regions=rev(bpy.colors()))
## SIC1997
data("sic1997")
X <- sic1997$swiss1km[c("CHELSA_rainfall","DEM")]
mR <- train.spLearner(sic1997$daily.rainfall, covariates=X, lambda=1,
nearest = TRUE, parallel=FALSE)
summary(mR@spModel$learner.model$super.model$learner.model)
rainfall1km <- predict(mR, what="mspe")
op <- par(mfrow=c(1,2), oma=c(0,0,0,1), mar=c(0,0,4,3))
plot(raster::raster(rainfall1km$pred["response"]), col=plotKML::R_pal[["rainbow_75"]][4:20],
main="Predictions spLearner", axes=FALSE, box=FALSE)
points(sic1997$daily.rainfall, pch="+")
plot(raster::raster(rainfall1km$pred["model.error"]), col=rev(bpy.colors()),
main="Prediction errors", axes=FALSE, box=FALSE)
points(sic1997$daily.rainfall, pch="+")
par(op)
while (!is.null(dev.list())) dev.off()
## Ebergotzen data set
data(eberg_grid)
gridded(eberg_grid) <- ~x+y
proj4string(eberg_grid) <- CRS("+init=epsg:31467")
data(eberg)
eb.s <- sample.int(nrow(eberg), 1400)
eberg <- eberg[eb.s,]
coordinates(eberg) <- ~X+Y
proj4string(eberg) <- CRS("+init=epsg:31467")
## Binomial variable
summary(eberg$TAXGRSC)
eberg$Parabraunerde <- ifelse(eberg$TAXGRSC=="Parabraunerde", 1, 0)
X <- eberg_grid[c("PRMGEO6","DEMSRT6","TWISRT6","TIRAST6")]
mB <- train.spLearner(eberg["Parabraunerde"], covariates=X,
family=binomial(), cov.model = "nugget", parallel=FALSE)
eberg.Parabraunerde <- predict(mB)
plot(raster::raster(eberg.Parabraunerde$pred["prob.1"]),
col=plotKML::SAGA_pal[["SG_COLORS_YELLOW_RED"]], zlim=c(0,1))
points(eberg["Parabraunerde"], pch="+")
## Factor variable:
data(eberg)
coordinates(eberg) <- ~X+Y
proj4string(eberg) <- CRS("+init=epsg:31467")
X <- eberg_grid[c("PRMGEO6","DEMSRT6","TWISRT6","TIRAST6")]
mF <- train.spLearner(eberg["TAXGRSC"], covariates=X, parallel=FALSE)
TAXGRSC <- predict(mF)
plot(raster::stack(TAXGRSC$pred[grep("prob.", names(TAXGRSC$pred))]),
col=plotKML::SAGA_pal[["SG_COLORS_YELLOW_RED"]], zlim=c(0,1))
plot(raster::stack(TAXGRSC$pred[grep("error.", names(TAXGRSC$pred))]),
col=plotKML::SAGA_pal[["SG_COLORS_YELLOW_BLUE"]], zlim=c(0,0.45))
while (!is.null(dev.list())) dev.off()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.