knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The mcgf
package contains useful functions to simulate Markov chain Gaussian fields (MCGF) and regime-switching Markov chain Gaussian fields (RS-MCGF) with covariance structures of the Gneiting class [@Gneiting2002]. It also provides useful tools to estimate the parameters by weighted least squares (WLS) and maximum likelihood estimation (MLE). The mcgf
function can be used to fit covariance models and obtain Kriging forecasts. A typical workflow for fitting a non-regime-switching mcgf is given below.
mcgf
object by providing a dataset and the corresponding locations/distancesWe will demonstrate the use of mcgf
by an example below.
The Ireland wind data contains daily wind speeds for 1961-1978 at 11 synoptic meteorological stations in the Republic of Ireland. It is available in the gstat
package and is also imported to mcgf
. To view the details on this dataset, run help(wind)
. The object wind
contains the wind speeds data as well as the locations of the stations.
library(mcgf) data(wind) head(wind$data) wind$locations
To fit covariance models on wind
, the first step is to load the data set and de-trend the data to have a mean-zero time series. We will follow the data de-trending procedure in @Gneiting2006a.
lubridate
package.# install.packages("lubridate") library(lubridate) ind_leap <- month(wind$data$time) == 2 & day(wind$data$time) == 29 wind_de <- wind$data[!ind_leap, ] wind_de[, -1] <- sqrt(wind_de[, -1])
is_train <- year(wind_de$time) <= 1970 wind_train <- wind_de[is_train, ] wind_test <- wind_de[!is_train, ]
dplyr
package for this task.# install.packages("dplyr") library(dplyr) # Estimate annual trend avg_train <- tibble( month = month(wind_train$time), day = day(wind_train$time), ws = rowMeans(wind_train[, -1]) ) trend_train <- avg_train %>% group_by(month, day) %>% summarise(trend = mean(ws)) # Subtract annual trend trend_train2 <- left_join(avg_train, trend_train, by = c("month", "day"))$trend wind_train[, -1] <- wind_train[, -1] - trend_train2
# Subtract station-wise mean mean_train <- colMeans(wind_train[, -1]) wind_train[, -1] <- sweep(wind_train[, -1], 2, mean_train) wind_trend <- list( annual = as.data.frame(trend_train), mean = mean_train )
avg_test <- tibble( month = month(wind_test$time), day = day(wind_test$time) ) trend_train3 <- left_join(avg_test, trend_train, by = c("month", "day"))$trend wind_test[, -1] <- wind_test[, -1] - trend_train3 wind_test[, -1] <- sweep(wind_test[, -1], 2, mean_train)
We will first fit pure spatial and temporal models, then the fully symmetric model, and finally the general stationary model. First, we will create an mcgf
object and calculate auto- and cross- correlations.
wind_mcgf <- mcgf(wind_train[, -1], locations = wind$locations, longlat = TRUE) wind_mcgf <- add_acfs(wind_mcgf, lag_max = 3) wind_mcgf <- add_ccfs(wind_mcgf, lag_max = 3)
Here acfs
actually refers to the mean auto-correlations across the stations for each time lag. To view the calculated acfs
, we can run:
acfs(wind_mcgf)
Similarly, we can view the ccfs
by:
ccfs(wind_mcgf)
The pure spatial model can be fitted using the fit_base
function. The results are actually obtained from the optimization function nlminb
.
fit_spatial <- fit_base( wind_mcgf, model = "spatial", lag = 3, par_init = c(nugget = 0.1, c = 0.001), par_fixed = c(gamma = 0.5) ) fit_spatial$fit
Here we set gamma
to be 0.5 and it is not estimated along with c
or nugget
. By default mcgf
provides two optimization functions: nlminb
and optim
. Other optimization functions are also supported as long as their first two arguments are initial values for the parameters and a function to be minimized respectively (same as that of optim
and nlminb
). Also, if the argument names for upper and lower bounds are not upper
or lower
, we can create a simple wrapper to "change" them.
library(Rsolnp) solnp2 <- function(pars, fun, lower, upper, ...) { solnp(pars, fun, LB = lower, UB = upper, ...) } fit_spatial2 <- fit_base( wind_mcgf, model = "spatial", lag = 3, par_init = c(nugget = 0.1, c = 0.001), par_fixed = c(gamma = 0.5), optim_fn = "other", other_optim_fn = "solnp2" ) fit_spatial2$fit
The pure temporal can also be fitted by fit_base
:
fit_temporal <- fit_base( wind_mcgf, model = "temporal", lag = 3, par_init = c(a = 0.5, alpha = 0.5) ) fit_temporal$fit
Before fitting the fully symmetric model, we need to store the fitted spatial and temporal models to wind_mcgf
using add_base
:
wind_mcgf <- add_base(wind_mcgf, fit_s = fit_spatial, fit_t = fit_temporal, sep = T )
We can also fit the pure spatial and temporal models all at once by fitting a separable model:
fit_sep <- fit_base( wind_mcgf, model = "sep", lag = 3, par_init = c(nugget = 0.1, c = 0.001, a = 0.5, alpha = 0.5), par_fixed = c(gamma = 0.5) ) fit_sep$fit
Once can also store this model to wind_mcgf
, but to be consistent with @Gneiting2006a, we will just use the estimates from the pure spatial and temporal models to estimate the interaction parameter beta
.
When holding other parameters constant, the parameter beta
can be estimated by:
par_sep <- c(fit_spatial$fit$par, fit_temporal$fit$par, gamma = 0.5) fit_fs <- fit_base( wind_mcgf, model = "fs", lag = 3, par_init = c(beta = 0), par_fixed = par_sep ) fit_fs$fit
At this stage, we have fitted the base model, and we will store the fully symmetric model as the base model and print the base model:
wind_mcgf <- add_base(wind_mcgf, fit_base = fit_fs, old = TRUE) model(wind_mcgf, model = "base", old = TRUE)
The old = TRUE
in add_base
keeps the fitted pure spatial and temporal models for records, and they are not used for any further steps. It is recommended to keep the old model not only for reproducibility, but to keep a history of fitted models.
We will fit a Lagrangian correlation function to model the westerly wind:
fit_lagr <- fit_lagr(wind_mcgf, model = "lagr_tri", par_init = c(v1 = 200, lambda = 0.1), par_fixed = c(v2 = 0, k = 2) ) fit_lagr$fit
Finally we will store this model to wind_mcgf
using add_lagr
and then print the final model:
wind_mcgf <- add_lagr(wind_mcgf, fit_lagr = fit_lagr) model(wind_mcgf, old = TRUE)
For the test dataset, we can obtain the simple kriging forecasts and intervals for the empirical model, base model, and general stationary model using the krige
function:
krige_emp <- krige( x = wind_mcgf, newdata = wind_test[, -1], model = "empirical", interval = TRUE ) krige_base <- krige( x = wind_mcgf, newdata = wind_test[, -1], model = "base", interval = TRUE ) krige_stat <- krige( x = wind_mcgf, newdata = wind_test[, -1], model = "all", interval = TRUE )
Next, we can compute the root mean square error (RMSE), mean absolute error (MAE), and the realized percentage of observations falling outside the 95\% PI (POPI) for these models on the test dataset.
# RMSE rmse_emp <- sqrt(colMeans((wind_test[, -1] - krige_emp$fit)^2, na.rm = T)) rmse_base <- sqrt(colMeans((wind_test[, -1] - krige_base$fit)^2, na.rm = T)) rmse_stat <- sqrt(colMeans((wind_test[, -1] - krige_stat$fit)^2, na.rm = T)) rmse <- c( "Empirical" = mean(rmse_emp), "Base" = mean(rmse_base), "STAT" = mean(rmse_stat) ) rmse
mae_emp <- colMeans(abs(wind_test[, -1] - krige_emp$fit), na.rm = T) mae_base <- colMeans(abs(wind_test[, -1] - krige_base$fit), na.rm = T) mae_stat <- colMeans(abs(wind_test[, -1] - krige_stat$fit), na.rm = T) mae <- c( "Empirical" = mean(mae_emp), "Base" = mean(mae_base), "STAT" = mean(mae_stat) ) mae
# POPI popi_emp <- colMeans( wind_test[, -1] < krige_emp$lower | wind_test[, -1] > krige_emp$upper, na.rm = T ) popi_base <- colMeans( wind_test[, -1] < krige_base$lower | wind_test[, -1] > krige_base$upper, na.rm = T ) popi_stat <- colMeans( wind_test[, -1] < krige_stat$lower | wind_test[, -1] > krige_stat$upper, na.rm = T ) popi <- c( "Empirical" = mean(popi_emp), "Base" = mean(popi_base), "STAT" = mean(popi_stat) ) popi
We provide functionalists for computing simple Kriging forecasts for new locations. The associated function is krige_new
, and users can either supply the coordinates for the new locations or the distance matrices for all locations. Hypotheticaly, suppose a new location is at ($9^\circ$W, $52^\circ$N), then its kriging forecasts along with forecasts for the rest of the stations for the general stationary model can be obtained as follows.
krige_stat_new <- krige_new( x = wind_mcgf, newdata = wind_test[, -1], locations_new = c(-9, 52), model = "all", interval = TRUE ) head(krige_stat_new$fit)
Below is a time-series plot for the first 100 forecasts for New_1
:
plot.ts(krige_stat_new$fit[1:100, 12], ylab = "Wind Speed for New_1")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.