train.spLearner-methods: Train a spatial prediction and/or interpolation model using...

Description Usage Arguments Value Note Author(s) References Examples

Description

Automated spatial predictions and/or interpolation using Ensemble Machine Learning. Extends functionality of the mlr package. Suitable for predicting numeric, binomial and factor-type variables.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## S4 method for signature 'SpatialPointsDataFrame,ANY,SpatialPixelsDataFrame'
train.spLearner(
  observations,
  formulaString,
  covariates,
  SL.library,
  family = stats::gaussian(),
  method = "stack.cv",
  predict.type,
  super.learner = "regr.lm",
  subsets = 5,
  lambda = 0.5,
  cov.model = "exponential",
  subsample = 10000,
  parallel = "multicore",
  oblique.coords = TRUE,
  nearest = FALSE,
  buffer.dist = FALSE,
  theta.list = seq(0, 180, length.out = 14) * pi/180,
  spc = TRUE,
  id = NULL,
  weights = NULL,
  n.obs = 10,
  ...
)

Arguments

observations

SpatialPointsDataFrame.

formulaString

ANY.

covariates

SpatialPixelsDataFrame.

SL.library

List of learners,

family

Family e.g. gaussian(),

method

Ensemble stacking method (see makeStackedLearner) usually stack.cv,

predict.type

Prediction type 'prob' or 'response',

super.learner

Ensemble stacking model usually regr.lm,

subsets

Number of subsets for repeated CV,

lambda

Target variable transformation (0.5 or 1),

cov.model

Covariance model for variogram fitting,

subsample

For large datasets consider random subsetting training data,

parallel

logical, Initiate parallel processing,

oblique.coords

Specify whether to use oblique coordinates as covariates,

nearest

Specify whether to use nearest values and distances i.e. the method of Sekulic et al. (2020),

buffer.dist

Specify whether to use buffer distances to points as covariates,

theta.list

List of angles (in radians) used to derive oblique coordinates,

spc

specifies whether to apply principal components transformation.

id

Id column name to control clusters of data,

weights

Optional weights (per row) that learners will use to account for variable data quality,

n.obs

Number of nearest observations to be found in meteo::near.obs (by default 10),

...

other arguments that can be passed on to mlr::makeStackedLearner,

Value

object of class spLearner, which contains fitted model, variogram model and spatial grid used for Cross-validation.

Note

By default uses oblique coordinates (rotated coordinates) as described in Moller et al. (2020; doi: 10.5194/soil-6-269-2020) to account for geographical distribution of values. By setting the nearest = TRUE, distances to nearest observations and values of nearest neighbors will be used (see: Sekulic et al, 2020; doi: 10.3390/rs12101687). This method closely resembles geostatistical interpolators such as kriging. Buffer geographical distances can be added by setting buffer.dist=TRUE. Using either oblique coordinates and/or buffer distances is not recommended for point data set with distinct spatial clustering. Effects of adding geographical distances into modeling are explained in detail in Hengl et al. (2018; doi: 10.7717/peerj.5518) and Sekulic et al. (2020; doi: 10.3390/rs12101687). Default learners used for regression are: c("regr.ranger", "regr.ksvm", "regr.nnet", "regr.cvglmnet"). Default learners used for classification / binomial variables are: c("classif.ranger", "classif.svm", "classif.multinom"), with predict.type="prob". When using method = "stack.cv" each training and prediction round could produce somewhat different results due to randomization of CV. Prediction errors are derived by default using the forestError package method described in Lu & Hardin (2021). Optionally, the quantreg (Quantile Regression) option from the ranger package (Meinshausen, 2006) can also be used.

Author(s)

Tom Hengl

References

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
library(rgdal)
library(mlr)
library(rpart)
library(nnet)
demo(meuse, echo=FALSE)
## Regression:
sl = c("regr.rpart", "regr.nnet", "regr.glm")
system.time( m <- train.spLearner(meuse["lead"],
      covariates=meuse.grid[,c("dist","ffreq")],
      oblique.coords = FALSE, lambda=0,
      parallel=FALSE, SL.library=sl) )
summary(m@spModel$learner.model$super.model$learner.model)
## Not run: 
library(plotKML)
## regression-matrix:
str(m@vgmModel$observations@data)
meuse.y <- predict(m, error.type="weighted.sd")
plot(raster::raster(meuse.y$pred["response"]), col=plotKML::R_pal[["rainbow_75"]][4:20],
   main="Predictions spLearner", axes=FALSE, box=FALSE)

## Regression with default settings:
m <- train.spLearner(meuse["zinc"], covariates=meuse.grid[,c("dist","ffreq")],
        parallel=FALSE, lambda = 0)
## Ensemble model (meta-learner):
summary(m@spModel$learner.model$super.model$learner.model)
meuse.y <- predict(m)
## Plot of predictions and prediction error (RMSPE)
op <- par(mfrow=c(1,2), oma=c(0,0,0,1), mar=c(0,0,4,3))
plot(raster::raster(meuse.y$pred["response"]), col=plotKML::R_pal[["rainbow_75"]][4:20],
   main="Predictions spLearner", axes=FALSE, box=FALSE)
points(meuse, pch="+")
plot(raster::raster(meuse.y$pred["model.error"]), col=rev(bpy.colors()),
   main="Prediction errors", axes=FALSE, box=FALSE)
points(meuse, pch="+")
par(op)
while (!is.null(dev.list())) dev.off()
## Plot of prediction intervals:
pts = list("sp.points", meuse, pch = "+", col="black")
spplot(meuse.y$pred[,c("q.lwr","q.upr")], col.regions=plotKML::R_pal[["rainbow_75"]][4:20],
   sp.layout = list(pts),
   main="Prediction intervals (alpha = 0.318)")
while (!is.null(dev.list())) dev.off()

## Method from https://doi.org/10.3390/rs12101687
#library(meteo)
mN <- train.spLearner(meuse["zinc"], covariates=meuse.grid[,c("dist","ffreq")],
        parallel=FALSE, lambda=0, nearest=TRUE)
meuse.N <- predict(mN)
## Plot of predictions and prediction error (RMSPE)
op <- par(mfrow=c(1,2), oma=c(0,0,0,1), mar=c(0,0,4,3))
plot(raster::raster(meuse.N$pred["response"]), col=plotKML::R_pal[["rainbow_75"]][4:20],
   main="Predictions spLearner meteo::near.obs", axes=FALSE, box=FALSE)
points(meuse, pch="+")
plot(raster::raster(meuse.N$pred["model.error"]), col=rev(bpy.colors()),
   main="Prediction errors", axes=FALSE, box=FALSE)
points(meuse, pch="+")
par(op)
while (!is.null(dev.list())) dev.off()

## Classification:
SL.library <- c("classif.ranger", "classif.xgboost", "classif.nnTrain")
mC <- train.spLearner(meuse["soil"], covariates=meuse.grid[,c("dist","ffreq")],
   SL.library = SL.library, super.learner = "classif.glmnet", parallel=FALSE)
meuse.soil <- predict(mC)
spplot(meuse.soil$pred[grep("prob.", names(meuse.soil$pred))],
        col.regions=plotKML::SAGA_pal[["SG_COLORS_YELLOW_RED"]], zlim=c(0,1))
spplot(meuse.soil$pred[grep("error.", names(meuse.soil$pred))],
         col.regions=rev(bpy.colors()))

## SIC1997
data("sic1997")
X <- sic1997$swiss1km[c("CHELSA_rainfall","DEM")]
mR <- train.spLearner(sic1997$daily.rainfall, covariates=X, lambda=1,
         nearest = TRUE, parallel=FALSE)
summary(mR@spModel$learner.model$super.model$learner.model)
rainfall1km <- predict(mR, what="mspe")
op <- par(mfrow=c(1,2), oma=c(0,0,0,1), mar=c(0,0,4,3))
plot(raster::raster(rainfall1km$pred["response"]), col=plotKML::R_pal[["rainbow_75"]][4:20],
    main="Predictions spLearner", axes=FALSE, box=FALSE)
points(sic1997$daily.rainfall, pch="+")
plot(raster::raster(rainfall1km$pred["model.error"]), col=rev(bpy.colors()),
    main="Prediction errors", axes=FALSE, box=FALSE)
points(sic1997$daily.rainfall, pch="+")
par(op)
while (!is.null(dev.list())) dev.off()

## Ebergotzen data set
data(eberg_grid)
gridded(eberg_grid) <- ~x+y
proj4string(eberg_grid) <- CRS("+init=epsg:31467")
data(eberg)
eb.s <- sample.int(nrow(eberg), 1400)
eberg <- eberg[eb.s,]
coordinates(eberg) <- ~X+Y
proj4string(eberg) <- CRS("+init=epsg:31467")
## Binomial variable
summary(eberg$TAXGRSC)
eberg$Parabraunerde <- ifelse(eberg$TAXGRSC=="Parabraunerde", 1, 0)
X <- eberg_grid[c("PRMGEO6","DEMSRT6","TWISRT6","TIRAST6")]
mB <- train.spLearner(eberg["Parabraunerde"], covariates=X,
   family=binomial(), cov.model = "nugget", parallel=FALSE)
eberg.Parabraunerde <- predict(mB)
plot(raster::raster(eberg.Parabraunerde$pred["prob.1"]),
   col=plotKML::SAGA_pal[["SG_COLORS_YELLOW_RED"]], zlim=c(0,1))
points(eberg["Parabraunerde"], pch="+")

## Factor variable:
data(eberg)
coordinates(eberg) <- ~X+Y
proj4string(eberg) <- CRS("+init=epsg:31467")
X <- eberg_grid[c("PRMGEO6","DEMSRT6","TWISRT6","TIRAST6")]
mF <- train.spLearner(eberg["TAXGRSC"], covariates=X, parallel=FALSE)
TAXGRSC <- predict(mF)
plot(raster::stack(TAXGRSC$pred[grep("prob.", names(TAXGRSC$pred))]),
    col=plotKML::SAGA_pal[["SG_COLORS_YELLOW_RED"]], zlim=c(0,1))
plot(raster::stack(TAXGRSC$pred[grep("error.", names(TAXGRSC$pred))]),
    col=plotKML::SAGA_pal[["SG_COLORS_YELLOW_BLUE"]], zlim=c(0,0.45))
while (!is.null(dev.list())) dev.off()

## End(Not run)

landmap documentation built on Oct. 14, 2021, 5:24 p.m.