cv.strk: k-fold cross-validation for spatio-temporal regression...

cv.strkR Documentation

k-fold cross-validation for spatio-temporal regression kriging

Description

k-fold cross-validation function for spatio-temporal regression kriging based on pred.strk. Currently, only spatial (leave-location-out) cross-validation is implemented. Temporal and spatio-temporal cross-validation will be implemented in the future.

Usage

cv.strk(data,
        obs.col=1,
        data.staid.x.y.z = NULL,
        crs = NA,
        zero.tol=0,
        reg.coef,
        vgm.model,
        sp.nmax=20,
        time.nmax=2,
        type = "LLO",
        k = 5,
        seed = 42,
        folds,
        refit = TRUE,
        output.format = "STFDF",
        parallel.processing = FALSE,
        pp.type = "snowfall",
        cpus=detectCores()-1,
        progress=TRUE,
        ...)

Arguments

data

STFDF-class, STSDF-class, STIDF-class, sf-class, sftime-class, SpatVector-class or data.frame; Contains target variable (observations) and covariates in space and time used to perform STRK cross validation. If data.frame object, it should have next columns: station ID (staid), longitude (x), latitude (y), 3rd component - time, depth, ... (z) of the observation, observation value (obs), and covariates (cov1, cov2, ...). Covariate names should be the same as in the reg.coef (see below). If covariates are missing, then spatio-temporal ordinary kriging cross validation is performed.

obs.col

numeric or character; Column name or number showing position of the observation column in the data. Default is 1.

data.staid.x.y.z

numeric or character vector; Positions or names of the station ID (staid), longitude (x), latitude (y) and 3rd component - time, depth (z) columns in data.frame object (e.g. c(1,2,3,4)). If data is sf-class, sftime-class, or SpatVector-class object, data.staid.x.y.z is used to point staid and z position. If data is STFDF-class, STSDF-class, STIDF-class object, data.staid.x.y.z is used to point only staid position. Default is NULL.

crs

st_crs or crs; Source CRS of data. If data contains crs, crs will not be used. Default is NA.

zero.tol

numeric; A distance value below (or equal to) which locations are considered as duplicates. Default is 0. See rm.dupl. Duplicates are removed to avoid singular covariance matrices in kriging.

reg.coef

numeric; Vector of named linear regression coefficients. Names of the coefficients (e.g. "Intercept", "temp_geo", "modis", "dem", "twi") will be used to match appropriate covariates from data. Coefficients for metorological variables (temperature, precipitation, etc.) can be taken from data(tregcoef) or can be specified by the user.

vgm.model

StVariogramModel list; Spatio-temporal variogram of regression residuals (or observations if spatio-temporal ordinary kriging). See vgmST. Spatio-temporal variogram model on residuals for metorological variables (temperature, precipitation, etc.) can be taken from data(tvgms) or can be specified by the user as a vgmST object.

sp.nmax

numeric; A number of spatially nearest observations that should be used for kriging predictions. If tiling is TRUE (see below), then is a number of spatially nearest observations that should be used for each tile. Deafult is 20.

time.nmax

numeric; A number of temporally nearest observations that should be used for kriging predictions Deafult is 2.

type

character; Type of cross-validation: leave-location-out ("LLO"), leave-time-out ("LTO"), and leave-location-time-out ("LLTO"). Default is "LLO". "LTO" and "LLTO" are not implemented yet. Will be in the future.

k

numeric; Number of random folds that will be created with CreateSpacetimeFolds function. Default is 5.

seed

numeric; Random seed that will be used to generate outer and inner folds with CreateSpacetimeFolds function.

folds

numeric or character vector or value; Showing folds column (if value) or rows (vector) of data observations used for cross-validation. If missing, will be created with CreateSpacetimeFolds function.

refit

logical; If refit of linear regression trend and spatio-teporal variogram should be performed. Spatio-teporal variogram is fit using vgm.model as desired spatio-temporal model for fit.StVariogram function. Default is TRUE.

output.format

character; Format of the output, STFDF-class (default), STSDF-class, STIDF-class, data.frame, sf-class, sftime-class, or SpatVector-class.

parallel.processing

logical; If parallel processing is performed. Default is FALSE.

pp.type

character; Type (R package) of parallel processing, "snowfall" (default) or "doParallel".

cpus

numeric; Number of processing units. Default is detectCores()-1.

progress

logical; If progress bar is shown. Default is TRUE.

...

Further arguments passed to krigeST or pred.strk.

Value

A STFDF-class (default), STSDF-class, STIDF-class, data.frame, sf-class, sftime-class, or SpatVector-class object (depends on output.format argument), with columns:

obs

Observations.

pred

Predictions from cross-validation.

folds

Folds used for cross-validation.

For accuracy metrics see acc.metric.fun function.

Author(s)

Aleksandar Sekulic asekulic@grf.bg.ac.rs, Milan Kilibarda kili@grf.bg.ac.rs

References

Kilibarda, M., T. Hengl, G. B. M. Heuvelink, B. Graeler, E. Pebesma, M. Percec Tadic, and B. Bajat (2014), Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution, J. Geophys. Res. Atmos., 119, 2294-2313, doi:10.1002/2013JD020803.

See Also

acc.metric.fun pred.strk tregcoef tvgms regdata meteo2STFDF tgeom2STFDF

Examples

library(sp)
library(spacetime)
library(gstat)
library(plyr)
library(CAST)
library(doParallel)
library(ranger)
# preparing data
data(dtempc) 
data(stations)
data(regdata) # covariates, made by mete2STFDF function

regdata@sp@proj4string <- CRS('+proj=longlat +datum=WGS84')
data(tvgms) # ST variogram models
data(tregcoef) # MLR coefficients

lonmin=18 ;lonmax=22.5 ; latmin=40 ;latmax=46
serbia = point.in.polygon(stations$lon, stations$lat, c(lonmin,lonmax,lonmax,lonmin), 
                          c(latmin,latmin,latmax,latmax))
st = stations[ serbia!=0, ] # stations in Serbia approx.
crs = CRS('+proj=longlat +datum=WGS84')

# create STFDF
stfdf <- meteo2STFDF(obs = dtempc,
                     stations = st,
                     crs = crs)

# Cross-validation for mean temperature for days "2011-07-05" and "2011-07-06" 
# global model is used for regression and variogram

# Overlay observations with covariates
time <- index(stfdf@time)
covariates.df <- as.data.frame(regdata)
names_covar <- names(tregcoef[[1]])[-1]
for (covar in names_covar){
  nrowsp <- length(stfdf@sp)
  regdata@sp=as(regdata@sp,'SpatialPixelsDataFrame')
  ov <- sapply(time, function(i) 
    if (covar %in% names(regdata@data)) {
      if (as.Date(i) %in% as.Date(index(regdata@time))) {
        over(stfdf@sp, as(regdata[, i, covar], 'SpatialPixelsDataFrame'))[, covar]
      } else {
        rep(NA, length(stfdf@sp))
      }
    } else {
      over(stfdf@sp, as(regdata@sp[covar], 'SpatialPixelsDataFrame'))[, covar]
    }
  )
  ov <- as.vector(ov)
  if (all(is.na(ov))) {
    stop(paste('There is no overlay of data with covariates!', sep = ""))
  }
  stfdf@data[covar] <- ov
}

# Remove stations out of covariates
for (covar in names_covar){
  # count NAs per stations
  numNA <- apply(matrix(stfdf@data[,covar],
                        nrow=nrowsp,byrow= FALSE), MARGIN=1,
                 FUN=function(x) sum(is.na(x)))
  rem <- numNA != length(time)
  stfdf <-  stfdf[rem,drop= FALSE]
}

# Remove dates out of covariates
rm.days <- c()
for (t in 1:length(time)) {
  if(sum(complete.cases(stfdf[, t]@data)) == 0) {
    rm.days <- c(rm.days, t)
  }
}
if(!is.null(rm.days)){
  stfdf <- stfdf[,-rm.days]
}

### Example with STFDF and without parallel processing and without refitting of variogram
results <- cv.strk(data = stfdf,
                   obs.col = 1, # "tempc"
                   data.staid.x.y.z = c(1,NA,NA,NA),
                   reg.coef = tregcoef[[1]],
                   vgm.model = tvgms[[1]],
                   sp.nmax = 20,
                   time.nmax = 2,
                   type = "LLO",
                   k = 5,
                   seed = 42,
                   refit = FALSE,
                   progress = TRUE
)

# stplot(results[,,"pred"])
summary(results)
# accuracy
acc.metric.fun(results@data$obs, results@data$pred, "R2")
acc.metric.fun(results@data$obs, results@data$pred, "RMSE")
acc.metric.fun(results@data$obs, results@data$pred, "MAE")
acc.metric.fun(results@data$obs, results@data$pred, "CCC")


meteo documentation built on Nov. 23, 2023, 3:01 p.m.